Analysis of Pairwise Interactions in a Maximum Likelihood Sense to Identify Leaders in a Group

Mwaffo, Violet; Butail, Sachit; Porfiri, Maurizio

doi:10.3389/frobt.2017.00035

METHODS article

Front. Robot. AI, 31 July 2017

Sec. Robot Learning and Evolution

Volume 4 - 2017 | https://doi.org/10.3389/frobt.2017.00035

This article is part of the Research TopicNovel Technological and Methodological Tools for the Understanding of Collective BehaviorsView all 16 articles

Analysis of Pairwise Interactions in a Maximum Likelihood Sense to Identify Leaders in a Group

Violet Mwaffo¹

Sachit Butail²

Maurizio Porfiri¹*

¹Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, NY, United States
²Department of Mechanical Engineering, Northern Illinois University, DeKalb, IL, United States

Collective motion in animal groups manifests itself in the form of highly coordinated maneuvers determined by local interactions among individuals. A particularly critical question in understanding the mechanisms behind such interactions is to detect and classify leader–follower relationships within the group. In the technical literature of coupled dynamical systems, several methods have been proposed to reconstruct interaction networks, including linear correlation analysis, transfer entropy, and event synchronization. While these analyses have been helpful in reconstructing network models from neuroscience to public health, rules on the most appropriate method to use for a specific dataset are lacking. Here, we demonstrate the possibility of detecting leaders in a group from raw positional data in a model-free approach that combines multiple methods in a maximum likelihood sense. We test our framework on synthetic data of groups of self-propelled Vicsek particles, where a single agent acts as a leader and both the size of the interaction region and the level of inherent noise are systematically varied. To assess the feasibility of detecting leaders in real-world applications, we study a synthetic dataset of fish shoaling, generated by using a recent data-driven model for social behavior, and an experimental dataset of pharmacologically treated zebrafish. Not only does our approach offer a robust strategy to detect leaders in synthetic data but it also allows for exploring the role of psychoactive compounds on leader–follower relationships.

1. Introduction

It is generally hypothesized that the movement of animal groups is steered by influential individuals called leaders, which benefit the collective by locating food sources (Giardina, 2008) and protecting against predatory attacks (Partridge, 1982; Ballerini et al., 2008). Further, it is believed that these individuals accomplish these tasks by relying on environmental information available to them rather than social feedback (Dyer et al., 2009; King et al., 2009). Past studies in collective animal behavior have explained the emergence of leadership through several mechanisms, including the availability of extra group knowledge (Krause and Ruxton, 2002; Ioannou et al., 2011), hunger (Krause et al., 1992; Krause, 1993), personality traits (Leblond and Reebs, 2006; Nakayama et al., 2012), and morphophysiological variations (Reebs, 2001).

We work with the definition of leadership by Krause et al. (2000) “as the initiation of new directions of locomotion by one or more individuals which are then readily followed by other group members.” Under the assumption that leadership roles within an animal group are consistent through time and space within the duration of an experimental observation, we seek to identify leaders on the basis of the strength and direction of pairwise interactions among individuals. A leader will be recognized as an individual that exerts a strong one-directional interaction on other group members, while being marginally responsive to their behavior. The interaction between pairs of individuals can be quantified through correlation or information-theoretic measures that capture the directional relationship between the time series of motion data of the individuals. These include cross-correlation (Engel et al., 1990), event synchronization (Quiroga et al., 2002), and information-theoretic measures, such as transfer entropy (Schreiber, 2000), conditional transfer entropy (Sun et al., 2014), maximum entropy (Cavagna et al., 2014), causation entropy (Sun and Bollt, 2014), and union transfer entropy (Anderson et al., 2016).

Each of these measures has its advantages and limitations. Cross-correlation has been successfully used to identify leader–follower relationships from movement data of fish shoals (Krause et al., 2000; Ladu et al., 2014), but it assumes a linear relationship between the time series and is therefore less likely to dissect complex dependencies that consist of varying time delays and non-linear relationships (Ianniello, 1982; Peterson et al., 1998). Event synchronization measures synchronicity between extreme events in the time series (Quiroga et al., 2002) and has been used to identify connectivity structures in atmospheric processes (Malik et al., 2012) and legal policy data (Grabow et al., 2016), under the premise of occurrence of so called extreme events within the time series. Information-theoretic measures, like transfer entropy, have the advantage of being model-free (Steuer et al., 2002; Hlaváčková-Schindler et al., 2007; Vicente et al., 2011), and thereby enable the analysis of time series with varying delays and non-linear relationships. However, since the estimation of these measures requires computing probability distributions, information-theoretic quantities are data hungry (Ito et al., 2011). The duration of observations required to reliably identify relationships between time series increases exponentially with the dimensionality of the dataset (Ito et al., 2011), such that the treatment of multidimensional time series is considerably more challenging than scalar ones.

Animals are likely to communicate within a group through both linear and non-linear dependencies, mediated by unknown delays, making it difficult to pinpoint the specific measure that will perform best for a given dataset of group behavior. Accordingly, all of the above mentioned measures may be useful in identifying leaders at one time or another, and a combined approach that integrates these individual measures could offer a viable approach to study leadership. We detect leader–follower relationships by setting thresholds on average values of pairwise interactions obtained from three different methods: cross-correlation (Engel et al., 1990), event synchronization (Quiroga et al., 2002), and transfer entropy (Schreiber, 2000). To further improve the performance of leader detection beyond any of these methods, we combine them in a maximum likelihood sense to build a single classifier for detecting leaders (Barreno et al., 2008).

Validating this approach would be difficult on real behavioral data, where one may have limited knowledge of, and control on, leadership. Unlike self-propelled particle computer simulations, where leadership roles can be assigned artificially, identifying leaders within animal groups is hampered by the lack of a ground truth. In this context, we turn to self-propelled particle models to evaluate methods that can identify leaders in group motion. Self-propelled particle models can range from the simplest, where the individuals orient themselves in the general direction of their neighbors (Vicsek et al., 1995; Vicsek and Zafeiris, 2012), to more complex models where interactions include collision avoidance, attraction, and alignment (Aoki, 1982; Couzin et al., 2002, 2005). Data-driven models that incorporate detailed individual dynamics along with species-specific interactions (Gautrais et al., 2009, 2012; Kolpas et al., 2013; Borzí and Wongkaew, 2015; Mwaffo et al., 2015a, 2017; Zienkiewicz et al., 2015a,b; Collignon et al., 2016) provide an even more realistic setup to create such roles and test methods for identifying leaders.

We test our approach on a synthetic dataset comprising simulations of self-propelled particles interacting according to the Vicsek model (Vicsek et al., 1995). A single particle that is not responsive to the rest of the group is assigned the role of a leader. We compare the performance of each classifier as well as the combined classifier in terms of their ability to detect the leader particle. We systematically vary the level of inherent uncertainty and the size of the region of interaction, thereby modulating the degree of coordination within the group (Vicsek et al., 1995). Upon demonstrating the validity of the approach, we investigate its use in the study of realistic data on gregarious fish shoaling. First, we apply the method to detect leaders in an established data-driven model of fish social behavior (Gautrais et al., 2012). Then, we consider experimental data from our group on social behavior of pharmacologically treated zebrafish, in which one fish is exposed to moderate caffeine level to elicit a psychostimulant effect (Fisone et al., 2004; Ferré, 2008). Such a psychostimulant effect could be hypothesized to promote leadership, by potentially reducing social responsiveness and increasing the level of activity of the treated subject, which could be then recognized as a leader by untreated fish (Ladu et al., 2014; Shams and Gerlai, 2016).

The paper is organized as follows. In Section 2, we describe the three classification methods used for studying pairwise interactions in networks of dynamical systems. In Section 3, we explain our approach to detect leadership from raw time series of positional data. We evaluate the performance of all classifiers—individual and combined—on datasets consisting of particles interacting according to the Vicsek model in Section 4. In Section 5, we demonstrate the use of our approach on realistic simulation data and experimental observations on fish collective behavior. We conclude the manuscript with a discussion of the results and performance of the approach.

2. Quantifying Pairwise Interactions in Networks of Dynamical Systems

The process of detecting leaders in a group begins with the measurement of the time series of the individual motion, from which we seek to uncover social interactions. These time series can be obtained from simulated or experimental data. Specifically, for each individual i, i = 1, …, N, where N is the group size, we register a scalar time series ${x_{t}^{(i)}}_{t = 1}^{T}$ , where T is the duration of the time series and t is the time step. This time series, for example, would represent a salient observable of swimming activity, such as, turn rate, orientation, or positional preference with respect to a target stimulus.

To infer leader–follower relationships between a pair of individuals i and j, we examine three methods, namely, cross-correlation (CC), transfer entropy (TE), and event synchronization (ES). Different from our previous work (Butail et al., 2016), which focused on fish pairs and considered each classification method separately, here we address the more general problem of leader detection in groups in a maximum likelihood sense that integrates the three classifiers. For a pair of individuals and a given method, we construct a one-directional relationship between the individuals, whose magnitude measures the strength of the interaction and whose direction is always from the leader to the follower. In case none of the individuals in the pair is identified as a leader, the strength is set to zero. In general, each method could reveal a different leader–follower relationship for a given pair, and even if methods might agree on who is the leader and who is the follower, the strength of the interaction may vary. We label the strength of the interaction between i and j as ${CL}_{ij}^{(\cdot)}$ , where the dot specifies the selected method, CC, TE, ES, and CL abbreviates “classifier.”

An intuitive representation of leader–follower relationships within the group could be garnered by considering a directed network, where nodes correspond to individuals and weighted directed edges identify the role of each node in the pair (leader versus follower) and the strength of the interaction. As a result, we define the weighted adjacency matrix W, such that $W_{ij}^{(\cdot)} = 0$ if the method detects i as the follower and j as the leader, and $W_{ij}^{(\cdot)} = {CL}_{ij}^{(\cdot)} > 0$ if instead i is the leader for the pair ij. The ith row of W has non-zero elements where the pairwise interactions have i as a leader, and the entry corresponds to the value of the classifier. The ith column of W has non-zero elements for the pairwise interactions where i instead is recognized as a follower, and the corresponding entry is the value of the classifier. While it is not possible that both W_ij and W_ji are non-zero simultaneously, they can both be equal to zero, when the method does not identify a leader in the pair. The weighted adjacency matrix contains all the information that is acquired through the analysis of pairwise interactions, by bookkeeping the role of each node in every possible pairwise interaction and the corresponding strength. Figure 1 illustrates a network of interaction for a group of five individuals, along with the corresponding weighted adjacency matrix, concisely depicting pairwise leader-follower interactions in the group.

FIGURE 1

Figure 1. Illustration of pairwise directed interactions between five agents. In a pairwise interaction between two nodes, the edges start from the leader and terminate at the follower, and the weight of the edge, shown as lines of different thickness, is measured by the value of the classifier. The corresponding directed adjacency matrix W is also shown. In a pairwise interaction between two nodes, the edges start from the leader and terminate at the follower, and the weight of the edge, shown as lines of different thickness, is measured by the value of the classifier. Based on this, in the network shown above, node 4 acts as leader for the entire group.

2.1. Cross-Correlation

Cross-correlation measures the similarity between the processes as a function of time delay τ between them (Knapp and Carter, 1976), that is,

r_{ij} (τ) = \frac{\sum_{t}^{} [(x_{t}^{(i)} - {\bar{x}}^{(i)}) (x_{t - τ}^{(j)} - {\bar{x}}^{(j)})]}{\sqrt{\sum_{t} {(x_{t}^{(i)} - {\bar{x}}^{(i)})}^{2}} \sqrt{\sum_{t} {(x_{t - τ}^{(j)} - {\bar{x}}^{(j)})}^{2}}},

(1)

where ${\bar{x}}^{(i)}$ and ${\bar{x}}^{(j)}$ denote the time averages of $x_{t}^{(i)}$ and $x_{t}^{(j)}$ ; the value of t spans the range of overlap between the two time series. The value of delay, τ, that maximizes the cross-correlation r_ij(τ) in equation (1), over a range of values between −(T − 1) and T − 1, is called the time lag between the two time series, that is, $τ^{⋆} = {argmax}_{τ} r_{ij} (τ)$ .

When $τ_{ij}^{⋆} < 0$ , we say that $x_{t}^{(i)}$ anticipates $x_{t}^{(j)}$ , and we identify i as the leader and j as the follower. The numerical value of the corresponding cross-correlation quantifies the strength of the inferred leader–follower interaction, such that, ${CL}_{ij}^{CC} = r_{ij} (τ_{ij}^{⋆})$ .

2.2. Transfer Entropy

The computation of transfer entropy requires a probabilistic treatment of the time series. Specifically, we represent each time series ${x_{t}^{(i)}}_{t = 1}^{T}$ as a stochastic stationary process $X_{t}^{(i)}$ taking values in a finite set 𝒳. The cardinality of 𝒳 is related to the length of the time series, such that longer time series will allow for a high resolution description of the stochastic process, and therefore, a large cardinality. Transfer entropy (Schreiber, 2000) measures the reduction in the uncertainty in predicting one process given the knowledge of another. Transfer entropy from individual j to i is defined as

{TE}_{j \to i} = \sum_{𝒳^{3}} p (X_{t + 1}^{(i)}, X_{t}^{(i)}, X_{t}^{(j)}) \log \frac{p (X_{t + 1}^{(i)} {| X}_{t}^{(i)}, X_{t}^{(j)})}{p (X_{t + 1}^{(i)} {| X}_{t}^{(i)})}

(2)

Here, $p (X_{t + 1}^{(i)}, X_{t}^{(i)}, X_{t}^{(j)})$ denotes the joint probability of the future and current state of individual i and the current state of individual j; $p (X_{t + 1}^{(i)} {| X}_{t}^{(i)}, X_{t}^{(j)})$ denotes the conditional probability of the future state of individual i given the current states of both individuals i and j; and $p (X_{t + 1}^{(i)} {| X}_{t}^{(i)})$ denotes the probability of the future state of individual i conditioned on its current state. The probability distributions can be estimated using histograms (Vejmelka and Palus, 2008) or kernel density estimators (Schreiber, 2000). Transfer entropy is a non-negative quantity, which is equal to zero if individual j has no influence on individual i. In this case, $p (X_{t + 1}^{(i)} {| X}_{t}^{(i)}) = p (X_{t + 1}^{(i)} {| X}_{t}^{(i)}, X_{t}^{(j)})$ .

We say that i is the leader and j the follower if TE_i_→_j > TE_j_→_i. The value of the, positive, net transfer entropy from the leader to the follower measures the strength of the interaction, that is, ${CL}_{ij}^{TE} = {TE}_{i \to j} - {TE}_{j \to i}$ .

2.3. Extreme-Event Synchronization

Extreme-event synchronization was proposed in Quiroga et al. (2002) to measure synchronicity between signals by comparing the occurrence of extreme events. Briefly, the times when extreme events occur in the two time series for individuals i and j are indexed by ${t_{k}^{i}}_{k = 1}^{m_{i}}$ and ${t_{k}^{j}}_{k = 1}^{m_{j}}$ , where m_i and m_j are the number of extreme events in the times series of i and j, respectively. These sequences identify the time steps at which the processes exceed a predefined threshold in magnitude; we call such instances extreme events. The number of extreme events for i that occur within a window of duration ξ from those for j are

c^{ξ} (i | j) = \sum_{k = 1}^{m_{i}} \sum_{l = 1}^{m_{j}} J_{kl}^{ξ},

(3)

where

J_{kl}^{ξ} = \{\begin{matrix} 1 & if 0 < t_{l}^{j} - t_{k}^{i} \leq ξ, \\ 1 ∕ 2 & if t_{k}^{i} = t_{l}^{j}, \\ 0 & otherwise . \end{matrix}

(4)

From the quantity above, we compute event synchronicity and event delay (Quiroga et al., 2002) as follows:

\begin{align} Q_{ij}^{ξ} & = \frac{c^{ξ} (j | i) + c^{ξ} (i | j)}{\sqrt{m_{i} m_{j}}}, \end{align}

(5)

\begin{align} q_{ij}^{ξ} & = \frac{c^{ξ} (j | i) - c^{ξ} (i | j)}{\sqrt{m_{i} m_{j}}} . \end{align}

(6)

Event synchronicity is symmetric and measures the coupling between individuals i and j; event delay is asymmetric and measures the time lag between extremes events for i and j. By construction, $- 1 \leq q_{ij}^{ξ} \leq 1$ , such that when $q_{ij}^{ξ} > 0$ , the occurrence of extreme events for i systematically precede those for j. We use the sign of event delay to determine leadership, whereby i is the leader if $q_{ij}^{ξ} > 0$ . The strength of the interaction is determined by event synchronicity, that is, ${CL}_{ij}^{ES} = Q_{ij}^{ξ}$ . By construction, $0 \leq Q_{ij}^{ξ} \leq 1$ , with $Q_{ij}^{ξ} = 1$ identifying completely synchronous events.

3. Detecting Leaders in Groups

We define group leaders as individuals that on average lead within pairwise interactions with other group members. Using the network representation in Figure 1, we identify a group leader as the node with the largest weighted degree, measured as the difference between the weighted out-degree and the weighted in-degree. For node i, the weighted out-degree is the sum of all the pairwise interactions in which the individuals acts as a leader, that is, $\sum_{j = 1}^{N} W_{ij}^{(\cdot)}$ . The weighted in-degree is the sum of all the pairwise interactions in which the individual acts as a follower, that is, $\sum_{j = 1}^{N} W_{ji}^{(\cdot)}$ .

As a result, a group leader may not be a leader in every single pairwise interaction, but will have the strongest average effect on the overall group. Specifically, we define the average pairwise interaction for an individual i as

{\bar{CL}}_{i}^{(\cdot)} = \frac{1}{N - 1} \sum_{j = 1}^{N} (W_{ij}^{(\cdot)} - W_{ji}^{(\cdot)})

(7)

and we seek to identify which individual maximizes this quantity. Leaders are classified by setting a threshold T_(⋅) on the value obtained from equation (7). This combination of average pairwise interaction and the associated threshold constitutes a single classifier.

3.1. Classifier Performance

The performance of a classifier is evaluated in terms of the number of true and false positives and is dependent on the value of the threshold. A visual aid used in comparing different thresholds is the receiver operating characteristic (ROC) curve which plots the number of true positives against false positives for a range of thresholds (Fukunaga, 2013), see, for example, Figure 2.

FIGURE 2

Figure 2. Pictorial illustration of ROC analysis for assessing classifier performance. ROC curves for three hypothetical classifiers are plotted with their respective cutoff points in green, blue, and red. A combined ROC in black is plotted by selecting only three points over the 2⁹ produced by the maximum likelihood method. For each curve, the solid marker identifies the operating point, and the empty markers label other cutoff points.

In this respect, a good classifier has few false positives and a large number of true positives for a range of thresholds. Classifier performance can be quantified from the ROC curve by calculating the area under the curve (AUC). A perfect classifier will have 100% true positive rate (TPR) for all values of false positive rate (FPR), and therefore the AUC will be 1. In contrast, a classifier that performs at chance level will have the same number of true and false positives at all combinations and its ROC curve will lie on the diagonal line resulting in an AUC of 0.5.

The optimal threshold value that gives the best performance for a classifier can be estimated from the ROC curve based on several different measures, including distance from the top left corner and the Youden index which maximizes the difference between TPR and FPR (Youden, 1950). The corresponding operating point on the ROC curve, which selects the optimal threshold, lies at the maximum vertical distance from the 45° line.

3.2. Combining Classifiers Using Likelihood Ratio

Multiple classifiers can be combined to yield an optimal performance, as illustrated in Figure 2, where the black curve is closer to an ideal classifier at the top left corner. Specifically, we combine classifiers in the Neyman–Pearson sense in that the resulting optimal classifier maximizes TPR for a given FPR (Barreno et al., 2008).

The output of a classifier, ${\bar{CL}}_{i}^{(\cdot)}$ , and the associated threshold T_(⋅) corresponding to the operating point, can be mapped into the binary choice set {0, 1} such that the detection of an individual as a leader corresponds to ${\bar{CL}}_{i}^{(\cdot)} \geq T_{(\cdot)} \equiv 1$ and as a follower to ${\bar{CL}}_{i}^{(\cdot)} < T_{(\cdot)} \equiv 0$ . For clarity, we suppress the implicit dependence on the threshold, and denote a classifier simply as ${\bar{CL}}_{i}^{(\cdot)}$ . The likelihood ratio for a combination of classifiers $C = ({\bar{CL}}^{CC}, {\bar{CL}}^{TE}, {\bar{CL}}^{ES})$ is defined as ℓ(C) = P(C|H₁)/P(C|H₀), where H₁ and H₀ correspond to the hypotheses that the individual being evaluated is a leader or a follower, respectively. In this sense, $P_{(\cdot)}^{D} = P ({\bar{CL}}_{i}^{(\cdot)} = 1 {| H}_{1})$ corresponds to TPR, and $P_{(\cdot)}^{F} = P ({\bar{CL}}_{i}^{(\cdot)} = 1 {| H}_{0})$ to FPR. The Neyman–Pearson lemma states that for some value of κ ∈ (0, ∞) and γ ∈ [0, 1], the likelihood ratio test

D (C) = \{\begin{matrix} 1 & if ℓ (C) > κ, \\ γ & if ℓ (C) = κ, \\ 0 & if ℓ (C) < κ \end{matrix}

(8)

has the highest detection rate, P(𝒟(C) = 1|H₁), for a bound on FPR.

The optimal values κ* and γ* in the likelihood ratio test are obtained by interpolating between select points on the ROC curve including the operating point, and the (1,1) and (0,0) points on the extreme. These two extreme points identify the cases in which we always classify an individual as a leader, (1,1), or as a follower, (0,0). By interpolating and moving along this new curve, we can tune the false alarm rate. The new ROC curve constructed in this way is called the likelihood-ratio ROC (LR-ROC) (Barreno et al., 2008). Each region of the LR-ROC corresponds to a different decision rule, such that the analyst could locate and use different combinations of classifiers that provide the best performance.

Assuming that the classifiers are conditionally independent, that is $P ({\bar{CL}}_{i}^{CC}, {\bar{CL}}_{i}^{TE}, {\bar{CL}}_{i}^{ES} {| H}_{c}) = P ({\bar{CL}}_{i}^{CC} {| H}_{c}) P ({\bar{CL}}_{i}^{TE} {| H}_{c}) P ({\bar{CL}}_{i}^{ES} {| H}_{c})$ , c ∈ {0, 1}, we use the true and false positive rates of each to construct the LR-ROC. Specifically, each classifier has two possible outcomes for an individual, that is, an individual can be classified as a follower, when outcome is 0, or leader, when outcome is 1. This results in a total of 2³ = 8 possible outcomes for three classifiers. Using the notation $ℓ (1_{(\cdot)}) = P_{(\cdot)}^{D} {∕ P}_{(\cdot)}^{F}$ to denote the likelihood of classifying an individual as a leader, and $ℓ (0_{(\cdot)}) = (1 - P_{(\cdot)}^{D}) ∕ (1 - P_{(\cdot)}^{F})$ to denote the likelihood of classifying an individual as a follower, we arrange the likelihood ratios in increasing order for eight possible outcomes for three classifiers. From this ordering, for a given value of the false positive rate, we determine the combined true positive rate as the probability maximizing the likelihood ratio, and as such, we construct the combined ROC. The outcomes can be represented with Boolean operators (AND, OR, NOT) to make a combined classifier, where the space of Boolean combinations has cardinality $2^{2^{3}} = 256$ .

In practice, we combine the three classification methods by using three points on their respective ROC. The selection of a small subset of points on the ROC curves is primarily to contain the intensive computational cost associated with searching for the optimal classifier among all possible Boolean combinations (Barreno et al., 2008). Accordingly, we select three points per classifier, close to 25% quartile, 50% quartile and at the operating point of the ROC. Further, in the event that the combined classifier performance measured by the AUC is less than the one of any individual classifier, due to the selection of only three points for the combination, we force the combined method to match the convex hull of the tree classifiers.

Even with three points on each ROC curve, finding the Boolean rule that corresponds to a location on the combined ROC, built using three points per individual classifier,¹ involves searching through a space of $2^{2^{9}} \approx 1.3 \times 10^{154}$ Boolean combinations of outcomes, which is practically difficult. This does not mean that the combined ROC has no value, since it provides an upper reference bound on which we could test simple Boolean rules that can be easily implemented on a dataset. Such a comparison could be performed by computing the distance between the operating point on the combined ROC and the point that corresponds to a candidate Boolean rule (Khreich et al., 2010).

The maximum likelihood combination of classifiers is a general approach that can accommodate more classifiers, beyond the three considered in this work. However, as the space of Boolean combinations of classifier outcomes rises exponentially (Barreno et al., 2008), the capability of finding the optimal combination becomes practically unfeasible. The combined ROC curve provides an upper bound on which to evaluate candidate Boolean combinations for use in real datasets.

4. Classifying Leaders in Vicsek Self-Propelled Particles

4.1. Modeling Leadership

We adapt the self-propelled particle model proposed by Vicsek (VM) to include leaders, as individuals that do not adjust their heading in response to the rest of the group. Leaders will only change their heading as a function of inherent uncertainty; this behavior could be associated with some prior knowledge of the environment that would manifest into a preference for a given direction. Followers, instead, update their heading based on the response of the group, under the effect of inherent uncertainty. In particular, the model consists of N particles moving in a square of side length L with periodic boundary conditions.

In the complex plane, the position x_i ∈ 𝒞 and orientation θ_i of the ith particle changes in time as

\begin{align} x_{i} (t + 1) & = x_{i} (t) + v e^{I θ_{i} (t + 1)}, \end{align}

(9a)

\begin{align} θ_{i} (t + 1) & = Arg [U_{i} (t)] + η ζ, \end{align}

(9b)

where Arg[⋅] is the phase of a vector; I is the imaginary unit; v is the constant, common speed; η ≥ 0 is the noise intensity; and ζ is uniform random noise in [−π, π). The vector U_i(t) defines the desired heading of the ith particle, such that

U_{i} (t) = \{\begin{matrix} \frac{1}{{| N}_{i} (t) |} \sum_{j \in N_{i} (t)} e^{I θ_{j} (t)}, & if i is a follower, \\ e^{I θ_{0}}, & if i is a leader, \end{matrix}

(10)

where θ₀ is the preferred heading of the leader. Here, $N_{i} (t) = \{j = 1, \dots, N : {| x}_{i} (t) - x_{j} (t) | \leq r\}$ is the set of ${| N}_{i} (t) |$ individuals within a circle of radius r > 0 from the ith particle. From r and L, one may estimate the average number of neighbors with which a given particle interacts at any time step as $1 + π \frac{r^{2}}{L^{2}} (N - 1)$ (see, for example, Aldana et al., 2007).

Using the VM, we simulate 30 realizations of a group of N = 5 self-propelled particles. The simulations are initialized by drawing the particle positions uniformly in a square of length L = 1 with their orientations uniformly sampled from [−π, π). Simulations are performed for 20,000 time steps. Particle turn rate is computed from its heading angle, as θ_i(t + 1) − θ_i(t) for the ith particle, and utilized to evaluate pairwise interaction using cross-correlation, transfer entropy, and event synchronization. Turn rate is selected as the key variable for measuring pairwise interactions based on the structure of the VM, in which the only interaction rule is alignment and each particle consistently utilizes its previous heading in the computation of the current heading. As a result, pairwise interactions are likely to manifest in changes of the turn rates.

4.2. Classification

Cross-correlation is computed over the entire length of the time series using the Matlab function xcorr. Transfer entropy is computed using PROCESS_NETWORK_v.1.4 software (Ruddell and Kumar, 2009) by estimating the joint probability densities in equation (2) through histograms. The software is run with a total of 18 bins to differentiate the net transfer entropy between group leaders and followers in the VM (see Figure S1 in Supplementary Material). Event synchronization is computed using the MATLAB function Event_sync developed by Quiroga et al. (2002). To evaluate extreme-event synchronization, similar to Quiroga et al. (2002), the time series of extreme events are extracted from the absolute turn rate, by finding a local maximum over a window of 30 data points. Events between the two time series are considered synchronous if the time lag between them is smaller than half the minimum time lag between successive extreme events in each series (Quiroga et al., 2002). The ROC curves are plotted using the function perfcurve available in MATLAB.

Figure 3 illustrates the numerical values of the classification indices in equation (7) for a group of N = 5 particles without a leader, with one leader, and with two leaders. For this example, cross-correlation is affected by large standard deviations that may mask the success of the detection. Transfer entropy and event synchronization, instead, consistently identify leaders in the group based on the direction and strength of pairwise interactions. To offer some statistical ground for comparing the methods and help assessing the role of model parameters, we next analyze AUC values, focusing on the case of a single leader in the group.

FIGURE 3

Figure 3. Classification index ${\bar{CL}}_{i}$ , for particles i = 1, …, N computed for cross-correlation (A,D,G), transfer entropy (B,E,F), event synchronization (C,F,I), without leader (A–C), with one leader (D–F), and with two leaders (G–I). Each simulated group includes five identical particles (i = 1, …, 5), and the Vicsek model parameters are set to v = 0.01, r = 0.23, and η = 0.21. Each bar refers to the mean value of the classifier across 30 simulations, and the error bar is one standard deviation. The numbering of particles that are not leaders is arbitrary, such that in panels (D–F) particle 1 is the leader and in panels (G–I) particles 1 and 2 are leaders.

Using ROC, we analyze the performance of the three classification methods in identifying leadership by varying the interaction radius r and the noise intensity η, while keeping the rest of model parameters constant. Figures 4A–C present the AUC of the three classifiers as the noise intensity and the radius of interaction are varied. In agreement with our expectations based on the representative case considered in Figure 3A cross-correlation is seldom able to correctly identify the leader in the group. For reference, the case displayed in Figure 3A has an AUC of 0.51. A likely reason for the limited performance of cross-correlation in detecting leaders in the VM is due to the presence of high-frequency noise in the turn rate, associated with the numerical differentiation of the noise which mediates the orientation update in the model. This noise is likely to suppress linear leader–follower relationships that might be successfully detected using cross-correlation.

FIGURE 4

Figure 4. Performance of the three classification methods in detecting the single leader measured by their AUC as a function of the radius of interaction r and the noise intensity η for cross-correlation (A), transfer entropy (B), and event synchronization (C), with N = 5 and v = 0.01.

Transfer entropy shows excellent performance for every selection of the radius of interaction and a noise intensity between 0.1 and 0.8; for reference the case displayed in Figure 3B has an AUC of 1.00. Excessively low noise results into all the particles aligning with the leader’s direction in a crystallized formation that does not promote information transfer. In this case, all the particles travel along the constant leader’s direction, such that the entropy of each group member is zero. For intensities above 0.8, the particles are nearly independent, such that their orientation update is entirely controlled by noise. In this case, although each particle has a large entropy, the interactions between the particles are masked by individual noise and transfer entropy between any pair of particles vanishes. Increasing the length of the time series could increase the range of noise intensities for which the method can be successful, although dealing with large time series is only realistic for synthetic data. Even if transfer entropy is based on the premise of pairwise interactions, the classification method is successful in isolating the leader for large values of the radius of interaction, which lead to the occurrence of higher-order interactions. This success could be attributed to the use of the average value net transfer entropy across all pairs to construct the classifier, which mitigates the possibility of biases associated with follower-to-follower interactions. Systems composed of a very large number of particles or the presence of strong heterogeneities could limit the success of the classifier.

Event synchronization demonstrates very good performance for every selection of the radius of interaction and a noise intensity less than 0.4; for reference the case displayed in Figure 3C has an AUC of 0.97. For low intensities, noise could manifest in the form of local extreme events in the turn rate which are readily captured by event synchronization. The superior performance of event synchronization with respect to cross-correlation should be attributed to its ability to pick up pairwise leader–follower relationships through varying time delays between extreme events. As noise increases, the frequency of such extreme events becomes too high for establishing faithful relationships between the time series.

The different noise intensity levels at which transfer entropy and event synchronization perform best motivate the need for combining the methods toward a better and more consistent approach to detect leaders in the Vicsek model over more wide range of noise intensities. Figure 5 demonstrates the performance of the combined method, which yields exact classification for any noise intensity below 0.9.

FIGURE 5

Figure 5. AUC obtained by combining the three classification methods shown in Figure 4 to detect the single leader, as a function of the radius of interaction r and the noise intensity η, with N = 5 and v = 0.01.

5. Applications to Fish Collective Behavior

To investigate the applicability of the leader detection approach on fish collective behavior we select two datasets. First, we generate fish-like trajectories from a random walker type model (Gautrais et al., 2012) that is able to successfully predict group alignment and average distance in barred flagtails (Kuhlia mugil). The data-driven model has five parameters to encapsulate individual swimming, social interactions, and wall interaction. Model parameters are based on selected based on simulations by Gautrais et al. (2012). A single fish is treated as a leader, such that it would not respond to the rest of the group. Second, we utilize trajectories from a group of zebrafish in an experiment where a single fish has been treated with caffeine. In contrast to the trajectories generated using the data-driven model where leadership is systematically controlled, there we explore whether caffeine treatment induces leadership in zebrafish.

5.1. Data-Driven Simulations

The model proposed by Gautrais et al. (2012) offers an authentic data-driven framework to describe the motion of a group of fish. In this model, the turn rate dynamics of a fish is described as a stochastic process modulated by interactions with the environment, which includes members of the group and the tank walls. From the knowledge of the turn rate $ω_{t}^{(i)} (rad s^{- 1})$ of fish i = 1, …, N, one determines the position r⁽ⁱ⁾ and orientation $ϕ_{t}^{(i)}$ with respect to a Cartesian coordinate system in ℛ² as follows:

\begin{array}{l} \frac{d r_{t}^{(i)}}{dt} & = v [\begin{matrix} \cos ϕ_{t}^{(i)} \\ \sin ϕ_{t}^{(i)} \end{matrix}], \end{array}

(11a)

\begin{array}{l} \frac{d ϕ_{t}^{(i)}}{dt} & = ω_{t}^{(i)}, \end{array}

(11b)

where v is the common, constant speed.

The instantaneous turn rate at time t is modeled by the mean reverting stochastic differential process (Gautrais et al., 2012; Calovi et al., 2014)

d ω_{t}^{(i)} = v [- α^{(i)} (ω_{t}^{(i)} -^{*} ω_{t}^{(i)}) dt + σ^{(i)} {dW}_{t}^{(i)}],

(12)

where α⁽ⁱ⁾ (s⁻¹) is the rate at which the process returns to its steady state and defines the time scale of the response of a fish to any perturbation; $d W_{t}^{(i)}$ is the infinitesimal increment of a standard Wiener process resulting in white noise; and $σ^{(i)} (rad s^{- 3 ∕ 2})$ is a scaling factor of the Wiener process that measures the level of uncertainty in the motion of a fish. The interaction with the environment is captured by the response function $^{*} ω_{t}^{(i)} (rad s^{- 1})$

\begin{align} ^{*} ω_{t}^{(i)} = k_{W}^{(i)} \frac{sign (ϕ_{W}^{(i)})}{τ_{W}^{(i)}} + \frac{1}{N} \sum_{j = 1}^{N} [k_{v}^{(i)} v^{(i)} \sin (ϕ_{t}^{(i, j)}) + k_{p}^{(i)} d_{t}^{(i, j)} \sin (θ_{t}^{(i, j)})] . \end{align}

(13)

In equation (13), the first term is used to model wall avoidance, and consists of the parameter $k_{W}^{(i)}$ , controlling the intensity of the wall avoidance, $τ_{W}^{(i)}$ , the time to collision, and $ϕ_{W}^{(i)}$ , the angle of incidence with the wall. Both the time to collision and the angle of incidence depend on the instantaneous position and orientation of the fish. The second term in equation (13) measures the interaction with the rest of the group. Therein, $k_{p}^{(i)}$ is a parameter controlling the strength of fish attraction toward the group; $d_{t}^{ij}$ and $θ_{t}^{ij}$ are the fish interindividual distance and relative angle within the group, respectively; $k_{v}^{(i)}$ is a parameter controlling the strength of fish alignment with the rest of the group; and $φ_{t}^{ij} = φ_{t}^{j} - φ_{t}^{i}$ .

We simulate 100 realizations of a group of N = 5 fish with a leader. The model is simulated for 120 s using an Euler–Maruyama discretization with time step duration 0.01 s in a circular tank of diameter of 4 m. Orientation is initialized randomly between [−π, π) and positions are initialized uniformly in the circular domain. The model parameters of the individual turn rate dynamics are taken from Gautrais et al. (2012), that is, α⁽ⁱ⁾ = 1/0.024 s⁻¹, σ⁽ⁱ⁾ = 28.9 m⁻¹ s^−1/2, and v = 0.564 m s⁻¹. These values are based on experimental observations on a group of five subjects. We set the first fish as a leader and assign its coupling parameters to zero, that is, $k_{p}^{(1)} = k_{v}^{(1)} = 0$ , similar to Butail et al. (2016). For the followers, we use $k_{p}^{(i)} = 0.41 m^{- 1} s^{- 1}$ , $k_{v}^{(i)} = 27 m^{- 1}$ , for i ≠ j = 2, …, 5, to favor coordinated motion, based on results in Zienkiewicz et al. (2015b) and Butail et al. (2016). For all fish, the wall avoidance parameters is set to $k_{W}^{(i)} = 4.7$ , which is larger than the value reported in Gautrais et al. (2012) to reflect the coupling values from Butail et al. (2016). Figure 6 shows a segment of the trajectories of the simulated group along with the time evolution of their turn rate, which is used for the leader detection process. The computation of the classifiers is analogous to the analysis of the VM, including the number of bins for the computation of transfer entropy that is chosen as 18 (see Figure S2 in Supplementary Material).

FIGURE 6

Figure 6. Two seconds of trajectory traces (A) and turn rate evolution (B) of a group of simulated fish with a leader in red and four followers in green. In the graph, time equal to zero does not correspond to the beginning of the simulation, when fish are uniformly distributed in the circular domain.

In Figure 7A, we illustrate the performance of the three classifiers in detecting leadership in the dataset generated using the data-driven model. All the classifiers are successful in detecting a leader beyond chance level, but, as expected from the analysis of the VM, their performance varies. Net transfer entropy and event synchronization, with AUC values at 0.90 and 0.85, respectively, perform better than cross-correlation, with an AUC value of 0.67. Figure 7B demonstrates the performance of the combined classifier, generated by selecting three points, indicated in the figure caption, on their respective ROC curve as operating points. Each point on the combined ROC corresponds to a potential combination which can be utilized as a classifier for leadership detection. The combined classifier has an AUC value of 0.99, which is superior than any of the individual classifiers.

FIGURE 7

Figure 7. ROC curve for the data-driven model of fish behavior with a single leader generated using each of the three classification methods (A) and a combined approach (B), which integrates the ROCs from the three methods plotted using only 20 sampled points over a total of 512. The combined ROC is obtained by sampling three points on the ROC for each method. In panel (A), the AUC for CC, TE, and ES is estimated at 0.67, 0.85, and 0.90, respectively. In panel (B), the selected cutoff points are chosen such that the first point is just above the 25% quartile, the second is just above the 50% quartile, and the third one is the operating point. The operating point for each individual method is identified as a solid marker, and the other two as open markers. The operating point of the combination of the three classifiers is shown as a solid marker and has ROC coordinates (0.04, 0.95). The AUC from the combined method is 0.99.

In Table 1, we show the performance of the best twenty simple Boolean rules with at most three classifiers, ranked based on the distance from the operating point of the combined ROC. For completeness, we display their FPR and TPR. The first five simple Boolean rules have an equivalent performance on this synthetic dataset with an FPR of only 0.08 and a TPR of 0.76.

TABLE 1

Table 1. Performance of 20 select Boolean rules on the synthetic dataset of data-driven model of fish social behavior.

5.2. Experiments on Pharmacologically Treated Zebrafish

To demonstrate the use of our approach in the study of experimental data on animal behavior, we investigate the possibility that the administration of a psychostimulant compound could elicit leadership in a group of fish. Specifically, we consider experimental data by our group (submitted work—data available upon request) on the collective behavior of caffeine-treated zebrafish swimming in a shallow water circular tank. The experimental procedure was carried out under protocol number 13-1424, approved by the University Animal Welfare Committee (UAWC) of New York University. In the literature, a number of studies have explored the effects of this psychoactive compound on the individual behavior of this popular animal model, but the effect of caffeine on zebrafish social behavior has yet to be fully understood (García-Pardo et al., 2015).

In our experiment, we test 10 groups of five fish, in which only one of the subjects is treated with caffeine at 25 mg/l concentration level. Fish motion is recorded from an overhead view at 40 frames per second for 5 min of experiments. A Daubechies wavelet filter is first applied to the fish centroid positions, and the turn rate of each fish, $ω_{t}^{i}$ with i = 1, …, 5, is consequently estimated from the curvature of the trajectory (Mwaffo et al., 2015b). Following (Butail et al., 2016), data are down-sampled to a sampling period of 0.2 s to minimize the effect of measurement noise on the interactions. The number of bin is set at 18 to ensure consistency with respect to the simulation results presented earlier.

To implement our method on experimental data of fish treated with caffeine, we select the Boolean rule ¬ CC ∧ TE ∨ ES in Table 1. This selection is based on the following reasons: (i) this Boolean rule shows the best performance on the synthetic data generated by the data-driven model of fish social behavior, as shown in Table 1 and (ii) it combines TE and ES, which are found to complement each other in the classification of leaders and followers in the VM for the entire parameter space, as shown in Figure 4. Although the other four best rules in Table 1 have the same performance on the simulated dataset, they do not use TE, which is important for detecting leaders in instances of the VM characterized by limited coordination between the particles. The thresholds of CC, TE, and ES used to implement the Boolean rule on experimental data are obtained from the ROCs for the synthetic data generated by the data-driven model of fish social behavior. Specifically, we scale the operating points on those ROCs by the maximum values of CC, TE, and ES in the simulation and apply these thresholds to experimental data, which is also scaled by their corresponding maximum values.

In Table S1 in Supplementary Material, we summarize the results of the combined detection rule. For 10 out of the 10 experiments, we find that the Boolean rule ¬ CC ∧ TE ∨ ES identifies the caffeine-treated fish as a leader for the group. By comparing the fraction of experiments in which the treated fish is identified as a leader (10/10) with chance (1/5) using a t-test, we cannot dismiss the hypothesis that caffeine treatment is a determinant of leadership (t(9) = 1, p < 0.01). This result could be explained by the psychostimulant effects of caffeine, which, similar to other psychoactive compounds, like lysergic acid diethylamide and 3,4-methylenedioxymethamphetamine, might modulate social responsiveness (Shams and Gerlai, 2016). Also, we may propose that caffeine could enhance fish activity and produce an increase in the frequency of fast and sudden turning maneuvers (Wong et al., 2010; Gupta et al., 2014). It is possible that the hyperactivity of the treated fish could be perceived by untreated fish as an indicator of fitness, boldness, or high social status, thereby favoring its appraisal as a group leader (Ladu et al., 2014).

6. Conclusion

Here, we investigate the possibility of detecting leaders in animal groups from raw position data of each individual. Our approach to leadership detection builds on the measurement of pairwise interactions between each pair of individuals to isolate individuals that exert maximum net influence over the rest of the group based on a receiving operating curve. Pairwise interactions are quantified using three independent methods—cross-correlation, transfer entropy, and event synchronization—that are cogently integrated to maximize our success to identify leaders from raw data. In the technical literature, each of these methods has been found to have differential success in the study of connectivity patterns: we hypothesize that their combination in a maximum likelihood sense would help bring to light their specific advantages and mitigate their limitations.

We demonstrate our approach through the systematic study of self-propelled particles described using the classical Vicsek model (Vicsek et al., 1995), in which particles update their orientation as a function of their neighbors and additive noise. The leader is modeled as a particle that has additional knowledge about a specific direction to take, thereby maintaining its orientation, irrespective of the rest of the group. We systematically elucidate the role of the radius of interaction and the noise intensity on the success of each of the three methods to detect the leader. While cross-correlation typically fails to accurately identify the leader, the combination of transfer entropy and event synchronization demonstrates excellent performance for any parameter selection. From raw time series, we show the possibility of exactly detecting a leader from small to large noise intensities, encapsulating disordered and ordered patterns, and form small to large radii of interactions, describing sparse to fully connected networks of followers. The possibility of successfully detecting a single leader is not masked by introducing mild heterogeneities in the groups.²

Based on the success of our combined approach, we tackle two realistic datasets of fish social behavior. First, we demonstrate the ability to detect a leader in a synthetic dataset generated using a data-driven model (Gautrais et al., 2012; Calovi et al., 2014), in which the turn rate of each fish is described as a mean reverting diffusion process. Through our combined approach, we are successful in precisely isolating the leader from the rest of the group. Next, we study an experimental dataset on pharmacologically treated fish, in which one of the subjects is administered caffeine to elicit a psychostimulant effect that could enhance activity and trigger leadership. In agreement with the premise of the experiment, through the application of our combined approach, we find that caffeine-treated subjects are more likely to emerge as leaders of the group.

Our approach of identifying leaders via the strength of interactions over experimental time assumes that the leaders are consistent throughout the entire observation, in time and in space, which may not be always the case (Nakayama et al., 2012). When these conditions lose validity, one may seek to partition the observation into contiguous measurements and implement the approach separately, on each measurement. If data are available at high resolution, the analysis should reveal how leadership varies in the group during the observation.

Another important assumption of our approach is that a group member can either be a leader or a follower, which may not always be the case (Rosenthal et al., 2015). Although it is possible to mark an interaction as leaderless based on the value of the interaction strength, computing the baseline for such values may require experiments that tie leadership with other personality based traits. Understanding the number of leaders that the method can detect is also an area that requires further research. While our method is able to identify single leaders in small and large groups,³ its applicability to the study of groups with multiple leaders may pose some technical challenges due to the possibility of large correlation lengths and groups splits (DeLellis et al., 2013).

Further, leaders in our simulated datasets assume a singular role in the group, whereby they are not influenced by the rest of the individuals. A scenario may exist where leaders could act on information provided by a subset of neighbors, designated as informed followers, in the absence of consensus (Cucker and Huepe, 2008). It is likely that in such scenarios, the interaction strength will be lowered as compared to the directed relationships simulated here, thereby challenging the process of inference based on ROC curves.

This study significantly strengthens our methodological toolbox to study leadership in animal groups, by empowering analysts with a model-free framework to investigate the basis and determinants of leadership. This effort significantly expands on our previous work (Butail et al., 2016), which is limited to pairs and does not offer a methodology to inform the selection of a classifier. Here, we address both these issues through a novel method to aggregate pairwise interactions underlying social behavior in groups and combine different classifiers toward an improved success of discovering leaders. Although our definition of leadership is based on turn rate, it could, in principle, be extended to other observables such as linear acceleration, which is a salient control variable for other fish species (Fish et al., 1991) that exhibit burst and coast motion.

Ethics Statement

The experimental procedure was carried out under protocol number 13-1424, approved by the University Animal Welfare Committee (UAWC) of New York University.

Author Contributions

All the authors designed the study, performed the analysis of the data, and wrote the manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

This work was supported by the National Science Foundation under Grant numbers # CMMI-1433670 and # CMMI-1505832, the Mitsui USA Foundation, and the Army Research Office under Grant number # W911NF-15-1-0267, with Drs. Samuel C. Stanton and Alfredo Garcia as the program managers.

Supplementary Material

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/frobt.2017.00035/full#supplementary-material.

Footnotes

^Selecting three points per ROC results in 9 binary classifiers to combine, for a total of 2⁹ points on the combined ROC.
^We tested our approach with a group of 5 simulated fish whose parameters were chosen within ±10% of their nominal values used to generate Figure 7. Our results show similar performance for each classifier as well as the improvement in performance from the combined classifier—see Figure S3 in Supplementary Material.
^We evaluated our approach with a group of 20 simulated fish, which shows similar performance for each classifier as well as the improvement in performance from the combined classifier—see Figure S4 in Supplementary Material.

References

Aldana, M., Dossetti, V., Huepe, C., Kenkre, V., and Larralde, H. (2007). Phase transitions in systems of self-propelled agents and related network models. Phys. Rev. Lett. 98, 095702. doi: 10.1103/PhysRevLett.98.095702

PubMed Abstract | CrossRef Full Text | Google Scholar

Anderson, R. P., Jimenez, G., Bae, J. Y., Silver, D., Macinko, J., and Porfiri, M. (2016). Understanding policy diffusion in the US: an information- theoretical approach to unveil connectivity structures in slowly evolving complex systems. SIAM J. Appl. Dyn. Syst. 15, 1384–1409. doi:10.1137/15M1041584