Graph Neural Networks for Decentralized Multi-Agent Perimeter Defense

In this work, we study the problem of decentralized multi-agent perimeter defense that asks for computing actions for defenders with local perceptions and communications to maximize the capture of intruders. One major challenge for practical implementations is to make perimeter defense strategies scalable for large-scale problem instances. To this end, we leverage graph neural networks (GNNs) to develop an imitation learning framework that learns a mapping from defenders' local perceptions and their communication graph to their actions. The proposed GNN-based learning network is trained by imitating a centralized expert algorithm such that the learned actions are close to that generated by the expert algorithm. We demonstrate that our proposed network performs closer to the expert algorithm and is superior to other baseline algorithms by capturing more intruders. Our GNN-based network is trained at a small scale and can be generalized to large-scale cases. We run perimeter defense games in scenarios with different team sizes and configurations to demonstrate the performance of the learned network.


INTRODUCTION
The problem of perimeter defense games considers a scenario where the defenders are constrained to move along a perimeter and try to capture the intruders while the intruders aim to reach the perimeter without being captured by the defenders . A number of previous works have solved this problem with engagements on a planar game space (Shishika and Kumar, 2018;Chen et al., 2021). However, in the real world, the perimeter may be represented by a three-dimensional shape as the players (e.g., defenders and intruders) may have the ability to perform three-dimensional motions. For example, a perimeter of a building that defenders aim to protect can be enclosed by a hemisphere. As a result, the defender robots should be able to move in three-dimensional space. For example, aerial robots have been well studied in various settings Nguyen et al., 2019;Lee et al., 2016Lee et al., , 2020a, and all these settings can be real-world use-cases for perimeter defense. For instance, intruders try to attack a military base in the forest and defenders aim to capture the intruders.
In this work, we tackle the perimeter defense problem in a domain where multiple agents collaborate to accomplish a task. Multi-agent collaboration has been explored in many areas including environmental mapping Thrun et al., 2000), search and rescue (Miller et al., 2020;Baxter et al., 2007), Robust perimeter defense performance with scalability. We demonstrate that our methods perform close to an expert policy (i.e., maximum matching algorithm Chen et al. (2014)) and are superior to other baseline algorithms. Our proposed networks are trained at a small scale and can be generalized to large scales.

Perimeter Defense
In a perimeter defense game, defenders aim to capture intruders by moving along a perimeter while intruders try to reach the perimeter without being captured by the defenders. We refer to  for a detailed survey. Many previous works dealt with engagements on a planar game space (Shishika and Kumar, 2018;Macharet et al., 2020;Chen et al., 2021;Bajaj et al., 2021;Hsu et al., 2022). For example, a cooperative multiplayer perimeter-defense game was solved on a planar game space in (Shishika and Kumar, 2018). In addition, an adaptive partitioning strategy based on intruder arrival estimation was proposed in (Macharet et al., 2020). Later, a formulation of the perimeter defense problem as an instance of the flow networks was proposed in (Chen et al., 2021). Further, an engagement on a conical environment was discussed in (Bajaj et al., 2021), and a model with heterogeneous teams was addressed in (Hsu et al., 2022).
High-dimensional extensions of the perimeter defense problem have been recently explored in (Lee and Bakolas, 2021;Yan et al., 2022;Lee et al., 2020bLee et al., , 2022a. For example, Lee and Bakolas (2021) analyzed the two-player differential game of guarding a closed convex target set from an attacker in high-dimensional Euclidean spaces. Yan et al. (2022) studied a 3D multiplayer reach-avoid game where multiple pursuers defend a goal region against multiple evaders. Lee et al. (2020bLee et al. ( , 2022a considered a game played between aerial defender and ground intruder. All of the aforementioned works focus on solving centralized perimeter defense problems, which assume that players have global knowledge of other players' states. However, decentralized control becomes a necessity as we reach a large number of players. To remedy this problem, Velhal et al. (2022) formulated the perimeter defense game into a decentralized multi-robot spatio-temporal multitask assignment problem on the perimeter of a convex shape. Paulos et al. (2019) proposed neural network architecture for training decentralized agent policies on the perimeter of a unit circle, where defenders have simple binary action spaces. Different from the aforementioned works, we focus on the high-dimensional perimeter, specialized to a hemisphere, with continuous action space. We solve multi-agent perimeter defense problems by learning decentralized strategies with graph neural networks.

Graph Neural Networks
We leverage graph neural networks as the learning paradigm because of their desirable properties of decentralized architecture that captures the interactions between neighboring agents and transferability that allows for generalization to previously unseen cases Ruiz et al., 2021). In addition, GNNs have shown great success in various multi-robot problems such as formation control (Tolstaya et al., 2019), path planning , task allocation (Wang and Gombolay, 2020), and multi-target tracking Sharma et al., 2022). Particularly, Tolstaya et al. (2019) utilized a GNN to learn a decentralized flocking behavior for a swarm of mobile robots by imitating a centralized flocking controller with global information. Later, Li et al. (2021) implemented GNNs to find collision-free paths for multiple robots from start positions to goal positions in obstacle-rich environments. They demonstrated that their decentralized path planner achieves a near-expert performance with local observations and neighboring communication only, which can also be generalized to larger networks of robots. The GNN-based approach was also employed to learn solutions to the combinatorial optimization problems in a multi-robot task scheduling scenario (Wang and Gombolay, 2020) and multi-target tracking scenario Sharma et al., 2022).

Motivation
Perimeter defense is a relatively new field of research that has been explored recently. One particular challenge is that the high-dimensional perimeters add spatial and algorithmic complexities for defenders to execute their optimal strategies. Although many previous works considered engagements on a planar game space and derived optimal strategies in 2D motions, the extension towards high-dimensional spaces is unavoidable for practical applications of perimeter defense games in real-world scenarios. For instance, a perimeter of a building that defenders aim to protect can be enclosed by a generic shape, such as a hemisphere. Since defenders cannot pass through the building and are assumed to be close to the building at any time, they are employed to move along the surface of the dome, which leads to the "hemisphere perimeter defense game." The intruder is moving on the base plane of the hemisphere, which implies a constant altitude during moving. The movement of the intruder is constrained to 2D since it is assumed that intruders may want to stay low in altitude to hide from the defenders in the real world.
It is worth noting that the hemisphere defense problem is more challenging to solve than a problem where both agents are allowed to freely move in a 3D space. There were previous works in which both defenders and intruders could move in 3-dimensional spaces (Yan et al., 2022(Yan et al., , 2019(Yan et al., , 2020. In all cases, the authors were able to explicitly derive the optimal solutions even in multi-robot scenarios. Although our problem limits the dynamics of the defenders to the surface of the hemisphere, these constraints make the finding of an optimal solution intractable and challenging.

Hemisphere Perimeter Defense
We consider a hemispherical dome with radius of R as perimeter. The hemisphere constraint is for the defender to safely move around the perimeter (e.g. building). In this game, consider two sets of players: A defender D i is constrained to move on the surface of the dome while an intruder A j is constrained to move on the ground plane. We will drop the indices i and j when they are irrelevant. An instance of 10 vs. 10 perimeter defense is shown in Figure 1. The positions of the players in spherical coordinates are: z D = [ψ D , φ D , R] and z A = [ψ A , 0, r], where ψ and φ are the azimuth and elevation angles, which gives the relative position as: z [ψ, φ, r], where ψ ψ A − ψ D and φ φ D . The positions of the players can also be described in Cartesian coordinates as: x D and x A . All agents move at unit speed, defenders capture intruders by closing within a small distance , and both defender and intruder are consumed during capture. An intruder wins if it reaches the perimeter (i.e., r(t f ) = R) at time t f without being captured by any defenders (i.e., ∀t < t f ). A defender wins by capturing an intruder or preventing it from scoring indefinitely (i.e., φ(t) = ψ(t) = 0, r(t) > R). The main interest of this work is to maximize the number of captures by defenders, given a set of initial configurations.

Optimal Breaching Point
Given z D , z A , we call breaching point as a point on the perimeter at which the intruder tries to reach the target, as shown B in Figure 2. We call the azimuth angle that forms the breaching point as breaching angle, denoted by θ, and call the angle between (z A − z B ) and the tangent line at B as approach angle, denoted by β. It is proved in (Lee et al., 2020b) that given the current positions of defender z D and intruder z A as point particles, there exists a unique breaching point such that the optimal strategy for both defender and intruder is to move towards it, known as optimal breaching point. The breaching angle and approach angle corresponding to the optimal breaching point are known as optimal breaching angle, denoted by θ * , and optimal approach angle, denoted by β * . As stated in (Lee et al., 2020b), although there exists no closed-form solution for θ * and β * , they can be computed at any time by solving two governing equations: and θ * = ψ − β * + cos −1 cos β * r (2)

Target Time and Payoff Function
We call the target time as the time to reach B and define τ D (z D , z B ) as the defender target time, τ A (z A , z B ) as the intruder target time, and the following as payoff function: The defender reaches B faster if p < 0 and the intruder reaches B faster if p > 0. Thus, the defender aims to minimize p while the intruder aims to maximize it.

Optimal Strategies and Nash Equilibrium
It is proven in (Lee et al., 2020b) that the optimal strategies for both defender and intruder are to move towards the optimal breaching point at their maximum speed at any time. Let Ω and Γ be the continuous v D and v A that lead to B so that τ D (z D , Ω) τ D (z D , z B ) and τ A (z A , Γ) τ A (z A , z B ), and let Ω * and Γ * be the optimal strategies that minimize τ D (z D , Ω) and τ A (z A , Γ), respectively, then the optimality in the game is given as a Nash equilibrium:

Problem Definition
To maximize the number of captures during N vs. N defense, we first recall the dynamics of a 1 vs. 1 perimeter defense game. It is proven in (Lee et al., 2020b) that the best action for the defender in one-on-one game is to move towards the optimal breaching point (defined in Section 3.3). The defender reaches the optimal breaching point faster than the intruder does if payoff p (defined in Section 3.4) is negative, and the intruder reaches faster if p > 0. From this, we infer that maximizing the number of captures in N vs. N defense is the same as finding a matching between the defenders and intruders so that the number of the negative payoff of assigned pairs is maximized. In an optimal matching, the number of negative payoffs stays the same throughout the overall game since the optimality in each game of defender-intruder pairs is given as a Nash equilibrium (see Section 3.5).
The expert assignment policy is a maximum matching (Shishika and Kumar, 2018;Chen et al., 2014). To execute this algorithm, we generate a bipartite graph with D and A as two sets of nodes (i.e., V = {1, 2, .., N }), and define the potential assignments between defenders and intruders as the edges. For each defender/node D i in D, we find all the intruders/nodes A j in A that are sensible by the defender and compute the corresponding payoffs p ij for all the pairs. We say that D i is strongly assigned to A j if p ij < 0. Using the edge set E given by maximum matching, we can maximize the number of strongly assigned pairs. For uniqueness, we choose a matching that minimizes the value of the game, which is defined as where E * is the subset of E with negative payoff (i.e. . This unique assignment ensures that the number of captures is maximized at the earliest possible. However, running the exhaustive search using maximum matching algorithm can be very expensive as the team size increases. This method is combinatorial in nature and assumes centralized information with full observability. Instead, we aim to find decentralized strategies that uses local perceptions {Z i } i∈V (see Section 4.1). To this end, we formalize the main problem of this paper as follows.
PROBLEM 1 (Decentralized Perimeter Defense with Graph Neural Networks). Design a GNN-based learning framework to learn a mapping M from the defenders' local perceptions {Z i } i∈V and their communication graph G to their actions U, i.e., U = M({Z i } i∈V , G), such that U is as close as possible to action set U g selected by a centralized expert algorithm.
We describe in detail our learning architecture for solving Problem 1 in the following section.

METHOD
In this paper, we learn decentralized strategies for perimeter defense using graph neural networks. Inference of our approach is in real-time, which is scalable to a large number of agents. We use an expert assignment policy to train a team of defenders who share information through communication channels. In Section 4.1, we introduce the perception module for processing the features that are input to GNN. Learning the decentralized algorithm through GNN and planning the candidate matching for the defenders are discussed in Section 4.2. The control of the defender team is explained in Section 4.3, and the training procedure is detailed in Section 4.4. The overall framework is shown in Figure 3. For the choice of architecture, we decouple the control module from the learning framework since directly learning the actions is unnecessary. Learning an assignment between agents is sufficient, and the best actions can be computed by the optimal strategies (Section 3.5).

Perception
In this section, we assume N aerial defenders and N ground intruders. Each defender D i is equipped with a sensor and faces outwards the perimeter with a field of view F OV . The defenders' horizontal field of view F OV is chosen as π assuming a fisheye-type camera.

Intruder features
For each i, a defender observes the set of intruders A j , and the relative positions in spherical coordinates between D i and A j are represented by A and N f D are selected as the fixed number of closest detected and neighboring agents, respectively. Although a defender can detect any number of intruders within the sensing range, a fixed number of detections is selected so that the system is scalable. In a decentralized setting, a defender should be able to decide its action based on its local perceptions. We experimentally chose the fixed number as 10 since an expert algorithm (i.e., the maximum matching) would always assign a defender to a robot among the 10 closest intruders.

Defender features
To make the system scalable, we build communication with a fixed number of closest defenders. Each defender D i communicates with nearby defenders D j within its communication range r c . For each i, the relative positions between D i and D j are represented by where N f D is the number of defender features. The selected number was 3 since communicating with many other robots would allow every defender to have full information of the environment (i.e., centralized) and 3 is the minimum number that the robots can collect information in every direction if we assume robots are scattered. If there are fewer than 10 detected intruders or 3 neighboring defenders, we hand over dummy values to fill up the perception input matrix. It is important to keep the input features constant since neural networks cannot handle varying feature sizes.

Feature extraction
Feature extraction is performed by concatenating the relative positions of observed intruders and communicated defenders, forming the local perceptions The extracted features are fed into a multi-layer perceptron (MLP) to generate the post-processed feature vector x i , which will be exchanged among neighbors through communications.

Learning & Planning
We employ graph neural networks with K-hop communications. Defenders communicate their perceived features with neighboring robots. The communication graph G is formed by connecting the nearby defenders within the communication range r c , and the resulted adjacency matrix S is given to the graph neural networks.

Graph Shift Operation
We consider each defender i, i ∈ V has a feature vector x i ∈ R F , indicating the post-processed information from D i . By collecting the feature vectors x i from all defenders, we have the feature matrix for the defender team D as: where x f ∈ R N , f ∈ [1, · · · , F ] is the collection of the feature f across all defenders; i.e., We conduct graph shift operation for each D i by a linear combination of its neighboring features, i.e., j∈N i x j . Hence, for all defenders D with graph G, the feature matrix X after the shift operation becomes SX with: Here, the adjacency matrix S is called the Graph Shift Operator (GSO) .

Graph Convolution
With the shift operation, we define the graph convolution by a linear combination of the shifted features on graph G via K-hop communication exchanges Li et al., 2020): where H k ∈ R F ×G represents the coefficients combining F features of the defenders in the shifted feature matrix S k X, with F and G denoting the input and output dimensions of the graph convolution. Note that, S k X = S(S k−1 X) is computed by means of k communication exchanges with 1-hop neighbors.

Graph Neural Network
Applying a point-wise non-linearity σ : R → R as the activation function to the graph convolution (Eq. 8), we define graph perception as: Then, we define a GNN module by cascading L layers of graph perceptions (Eq. 9): where the output feature of the previous layer − 1, X −1 ∈ R N ×F −1 , is taken as input to the current layer to generate the output feature of layer l, X . Recall that the input to the first layer is X 0 = X (Eq. 6). The output feature of the last layer X L ∈ R N ×G , obtained via K-hop communications, represents the exchanged and fused information of the defender team D.

Candidate matching
The output of the GNN, which represents the fused information from the K-hop communications, is then processed with another MLP to provide a candidate matching for each defender. Figure 3 shows a candidate matching instance if N f A = 6. Given a defender D i , we find the N f A closest intruders and number them from 1 to N f A clockwise. The main reason for numbering the nearby intruders clockwise is to interpret the feature outputs from our networks in identifying which intruders would be matched with which defenders. We could number them counterclockwise or in any arbitrary order. Since each defender learns decentralized strategies, it needs to specify an intruder to capture given its local perception. There are no global IDs for the intruders so without loss of generality we simply assign the IDs clockwise. The output from the multi-layer perceptron is an assignment likelihood L, which presents the probabilities of N f A intruder candidates' likelihood to be matched with the given defender. For instance, an expert assignment likelihood L g i for D i in Figure 3 would be [0.01, 0.01, 0.95, 0.01, 0.01, 0.01] if the third intruder (i.e., A 3 ) is matched with D i by the expert policy (i.e., maximum matching). The planning module selects the intruder candidate A j so that the matching pair (D i , A j ) would resemble the expert policy with the highest probability. It is worth noting that our approach renders a decentralized assignment policy given that only neighboring information is exchanged.

Permutation Equivalence
It is worth noting that our proposed GNN-based learning approach is scalable due to permutation equivalence. This means that given a decentralized defender, it should be able to decide the action based on local perceptions that consist of an arbitrary number of unnumbered intruders. An instance of a perimeter defense game is illustrated to show this property in Figure 4. The plots focus on a single defender and intruders are gradually approaching the perimeter as time passes by. The same intruders are colored in the same color across different time stamps. Notice that a new light-blue intruder enters into the field of view of the defender at t = 2, and a purple intruder begins to appear at t = 3. Although an arbitrary number of intruders are detected at each time, our system gives IDs to intruders shown as blue numbers in Figure 4. We number them clockwise but could have done differently in any permutation (e.g., counterclockwise) because graph neural networks perform label-independent processing. The reason for the numbering is to specify which intruders would be matched with which defenders from the network outputs. Without loss of generality, we assign the IDs clockwise but we note that these IDs are arbitrary since the IDs can change at different stamps. For instance, the yellow intruder ID is 2 at t = 1 but becomes 3 at t = 2, 3. Similarly, the red intruder ID is 3 at t = 1 but changes to 4 at t = 2 and 5 at t = 3. In this way, we accommodate an arbitrary amount of intruders and thus our system is permutation equivalent.

Control
The output from the Section 4.2 is inputted to the defender strategy module in Figure 3. This module handles all the matched pairs (D i , A j ) and computes the optimal breaching points for each of the one-onone hemisphere perimeter defense games (see Section 3.3). The defender strategy module collectively outputs the position commands, which are towards the direction of the optimal breaching points. The SO (3) command (Mellinger and Kumar, 2011) that consists of thrust and moment to control the robot at a low level is then passed to the defender team D for control. The state dynamics for the defender-intruder pair is detailed in (Lee et al., 2020b). The defenders move based on the commands to close the perception-action loop. Notably, the expert assignment likelihood L g would result in the expert action set U g (defined in Problem 1).

Training Procedure
To train our proposed networks, we use imitation learning to mimic an expert policy given by maximum matching (explained in Section 3), which provides the optimal assignment likelihood L g (described in Section 4.2) given the defenders' local perceptions {Z i } i∈V and the communication graph G. The training set D is generated as a collection of these data: D = {({Z i } i∈V , G, L g )}. We train the mapping M (defined in Problem 1) to minimize the cross-entropy loss between L g and L. We show that the trained M provides U that is close to U g . The number of learnable parameters in our networks is independent of the number of team sizes N . Therefore, we can train our networks on a small scale and generalize our model to large scales, given that defenders at any scale learn decentralized strategies based on the local perception of fixed numbers of agents.

Model Architecture
Our model architecture consists of a 2-layer MLP with 16 and 8 hidden layers to generate the postprocessed feature vector x i , a 2-layer GNN with 32 and 128 hidden layers to exchange the collected information from defenders, and a single-layer MLP to produce an assignment likelihood L. The layers in MLP and GNN are followed by ReLU.

Graph Neural Networks Details
In implementing graph neural networks, we construct a 1-hop connectivity graph by connecting defenders within communication range r c = 1. Given that the default radius is R = 1, we foresee that three neighboring agents within 1-hop would provide a wide sensing region for the defenders. Accordingly, we assume that communications occur in real-time with N f D = 3. Each defender gathers information as input features that consist of N f A = 10 closest intruder positions and N f D = 3 closest defender positions. The used parameters are summarized in Table 1.

Implementation Details
The experiments are conducted using a 12-core 3.50GHz i9-9920X CPU and an Nvidia GeForce RTX 2080 Ti GPU. We implement the proposed networks using PyTorch v1.10.1 (Paszke et al., 2019) accelerated with Cuda v10.2 APIs. We use the Adam optimizer with a momentum of 0.5. The learning rate is scheduled to decay from 5 × 10 −3 to 10 −6 within 1500 epochs with batch size 64, using cosine annealing. We choose these hyperparameters for the best performance.

Datasets
We evaluate our decentralized networks using imitation learning where the expert assignment policy is the maximum matching. The perimeter is a hemisphere with a radius R, which is defined by R = N/N def where N is team size and N def is a default team size. Since running the maximum matching is very expensive at large scales (e.g. N > 10), we set the default team size N def = 10. In this way, R also represents the scale of the game; for instance when N = 40, R becomes 2, which indicates that the scale of the problem's setting is doubled compared to the setting when R = 1. Given the team size N = 10, our experimental arena has a dimension of 10 × 10 × 1 m. In offline, we randomly sample 10 million examples of defender's local perception Z i and find corresponding G and L g to prepare the dataset, which is divided into a training set (60%), a validation set (20%), and a testing set (20%).

Metrics
We are mainly interested in the percentage of intruders caught (i.e., number of captures/total number of intruders). At small scales (e.g. N ≤ 10), an expert policy (i.e., the maximum matching) can be run and a direct comparison between the expert policy and our policy is available. At large scales (e.g. N > 10), the maximum matching is too expensive to run. Thus we compare our algorithm with other baseline approaches: greedy, random, and mlp, which will be explained in Section 5.3. To observe the scalability on small and large scales, we run a total of five different algorithms for each scale: expert, gnn, greedy, random, and mlp. In all cases, we compute the absolute accuracy, which is defined by the number of captures divided by the team size, to verify if our network can be generalized to any team size. Furthermore, we also calculate the comparative accuracy, defined as the ratio of the number of captures by gnn to the number of captures by another algorithm, to observe comparative results.

Compared Algorithms
In baseline algorithms, defenders do not communicate their "intentions" of which intruders would be captured by which neighboring defenders for a fair comparison since GNN does not share such information either. For the GNN framework, each defender perceives nearby intruders, and the relative positions of perceived intruders, not the "intentions," are shared by GNN through communications. The power of the GNNs is to learn these "intentions" implicitly via K-hop communications. That way, the decentralized decision-making (i.e., for both GNN and baselines) may allow multiple defenders to aim to capture the same intruder while the centralized planner knows the "intentions" of all the defenders and would avoid such a scenario.

Greedy
The greedy algorithm can be run in polynomial time and thus becomes a good candidate algorithm to be compared with our approach using GNN. For a fair comparison, we run a decentralized greedy algorithm based on local perception Z i of D i . We enable K-hop neighboring communications so that the sensible region of a defender is expanded as if the networking channels of GNN are active. The defender D i computes the payoff p ij (see Section 3.4) based on any sensible intruder A j and greedily chooses an assignment that minimizes the payoff p ij .

Random
The random algorithm is similar to the greedy algorithm in that the K-hop neighboring communications are enabled for the expanded perception. Among sensible intruders, a defender D i randomly picks an intruder to determine the assignment.

MLP
For the MLP algorithm, we only train the current MLP of our proposed framework in isolation by excluding the GNN module. By comparing our GNN framework to this algorithm, we can observe if the GNN gives any improvement.

Results
We run the perimeter defense game in various scenarios with different team sizes and initial configurations to evaluate the performance of the learned networks. In particular, we conduct the experiments at small (N ≤ 10) and large (N > 10) scales. The snapshots of the simulated perimeter defense game in top view with our proposed networks for different team sizes are shown in Figure 5. The perimeter, defender state, intruder state, and breaching point are marked in green, blue, red, and yellow, respectively. We observe that intruders try to reach the perimeter. Given the defender-intruder matches, the intruders execute their respective optimal strategies to move towards the optimal breaching points (see Section 3.5). If an intruder successfully reaches it without being captured by any defender, the intruder is consumed and leaves a marker called "Intrusion". If an intruder fails and is intercepted by a defender, both agents are consumed and leave a marker called "Capture". The points on the perimeter aimed by intruders are marked as "Breaching point". In all runs, the game ends at terminal time T f when all the intruders are consumed. See the supplemental video for more results.
As mentioned in Section 5.1, we run the five algorithms expert, gnn, greedy, random, and mlp at small scales, and run gnn, greedy, random, and mlp in large scales. As an instance, the snapshots of simulated 20 vs. 20 perimeter defense game in top view at terminal time T f using the four algorithms are displayed in Figure 6. The four subfigures (a)-(d) show that these algorithms exhibit different performance although the game begins with the same initial configuration in all cases. The number of captures by these algorithms gnn, greedy, random, and mlp are 12, 11, 10, 7, respectively.
The overall results of the percentage of intruders caught by each of these methods are depicted in Figure 7. It is observed that gnn outperforms other baselines in all cases, and performs close to expert at the small scales. In particular, given that our default team size N def is 10, the performance of our proposed algorithm stays competitive with that of the expert policy near N = 10.
At large scales, the percentage of captures by gnn stays constant, which indicates that the trained network can be well generalized to the large scales even if the training has been performed at the small scale. The percentage of captures by greedy also seems constant but performs much worse than gnn as the team size gets large. At small scales, only a few combinations are available in matching defender-intruder pairs and thus the greedy algorithm would perform similarly to the expert algorithm. As the number of agents increases, the number of possible matching increases exponentially so the greedy algorithm performs worse since the problem complexity gets much higher. The random approach performs worse than all other algorithms at small scales, but the mlp begins to perform worse than the random when the team size increases over 40. This tendency tells that the policy trained only with MLP cannot be scalable at large scales. Since the training is done with 10 agents, it is optimal near N = 10, but the mlp cannot work at larger scales and even performs worse than the random algorithm. It is confirmed that the GNN added to the MLP significantly improves the performance. Overall, compared to other algorithms, gnn performs better at large scales than at small scales, which validates that GNN helps the network become scalable.
To quantitatively evaluate the proposed method, we report the absolute accuracy and comparative accuracy (defined in Section 5.2) in Table 2 and Table 3. As expected, the absolute accuracy reaches the maximum when team size approaches N = 10. The overall values of the absolute accuracy are fairly consistent except when N = 2. We conjecture that there may not be much information shared by the two defenders and there could be no sensible intruders at all based on initial configurations.  Table 3. Accuracy for large scales

Frontiers
The comparative accuracy between gnn and expert shows that our trained policy gets much closer to the expert policy as N approaches 10, and we expect the performance of gnn to be close to that of expert even at the large scales. The comparative accuracy between gnn and other baselines shows that our trained networks perform much better than baseline algorithms at the large scales (N ≥ 40) with an average of 1.5 times more captures. The comparative accuracy between gnn and random is somewhat noisy throughout the team size due to the nature of randomness, but we observe that our policy can outperform random policy with an average of 1.8 times more captures at small and large scales. We observe that mlp performs much worse than other algorithms at large scales.
Based on the comparisons, we demonstrate that our proposed networks, which are trained at a small scale, can generalize to large scales. Intuitively, one may think that greedy would perform the best in a decentralized setting since each defender does its best to minimize the value of the game (defined in Eq. 5). However, we can infer that greedy does not know the intentions of nearby defenders (e.g. which intruders to capture) so it cannot achieve the performance close to the centralized expert algorithm. Our method implements graph neural networks to exchange the information of nearby defenders, which perceive their local features, to plan the final actions of the defender team; therefore, implicit information of where the nearby defenders are likely to move is transmitted to each neighboring defender. Since the centralized expert policy knows all the intentions of defenders, our GNN-based policy learns the intention through communication channels. The collaboration among the defender team is the key for our gnn to outperform greedy approach. These results validate that the implemented GNNs are ideal for our problem with the properties of the decentralized communication that captures the neighboring interactions and transferability that allows for generalization to unseen scenarios.

Performance vs. Number of expert demonstrations
To analyze the algorithm performance, we have trained our GNN-based architecture with a different number of expert demonstrations (e.g., 10 million, 1 million, 100k, and 10k). The percentage of intruders caught (average and standard deviation over 10 trials) on team size 10 ≤ N ≤ 50 are shown in Figure 8. The plot validates that our proposed network learns better with more demonstrations.

Performance vs. Perimeter radius
We have tested the GNN-based proposed method with different perimeter radii. Intuitively, given the fixed number of agents, increasing the radius may lead to a failure in the defense system. We set the default team size of defenders as 40 and increase the perimeter radius until the percentage of intruders caught converges to zero. As shown in Figure 9, the percentage decreases as the radius changes from 100m to 800m, converging to zero.

Performance vs. Number of intruders sensed
The performance of our GNN-based approach with different numbers of intruder (e.g., N f A ) sensed is shown in Figure 10. We have run the experiments with N f A as 1, 3, 5, and 10 since no ground truth expert policy is available to generate the training data for numbers larger than 10. We observe that the more intruder features are sensed, the better performances are shown. Further, the performance discrepancy tends to be smaller as the team size gets bigger. For some team size (e.g., 40), higher N f A performs much better, but this is expected based on the initial configuration of the game. For instance, if the initial configuration is very sparse, a defender will benefit from higher N f A , and the percentage of intruders caught will be higher.

Limitations
As perimeter defense is a relatively new field of research, this work has underlying limiting assumptions. In the problem formulation, we assume the robots are point particles. Accordingly, we assume optimal trajectories obey first-order assumptions. There is a preliminary work  to bridge the gap between the point particle assumptions and three-dimensional robots for one-on-one hemisphere perimeter defense, and we hope to extend the idea of this work to our multi-agent perimeter defense problem in the future. Another limitation is that there is no available expert policy, which can be compared with our proposed method, at large scales. Running the maximum matching algorithm is very expensive at large scales, so we compare our GNN-based algorithm with other baseline methods. Although the consistent performances of tested algorithms along different scales confirm that our trained networks can be generalized to large scales, we hope to explore another algorithm that can be used as an expert policy at large scales to replace the maximum matching. One consideration is utilizing reinforcement learning since the algorithm performance at large scales will be available.

CONCLUSION
This paper proposes a novel framework that employs graph neural networks to solve the decentralized multi-agent perimeter defense problem. Our learning framework takes the defenders' local perceptions and the communication graph as inputs and returns actions to maximize the number of captures for the defender team. We train deep networks supervised by an expert policy based on the maximum matching algorithm. To validate the proposed method, we run the perimeter defense game in different team sizes using five different algorithms: expert, gnn, greedy, random, and mlp. We demonstrate that our GNN-based policy stays closer to the expert policy at small scales and the trained networks can generalize to large scales.
One future work is to implement vision-based local sensing for the perception module, which would relax the assumptions of perfect state estimation. Realizing multi-agent perimeter defense with vision-based perception and communication within the defenders will be an end goal. Another future research direction is to find a centralized expert policy in multi-robot systems by utilizing reinforcement learning.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
EL, LZ, and VK contributed to conception and design of the study. EL and AR performed the statistical analysis. EL wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.