How Does the Fusion of Sensory Information From Audition and Vision Impact Collective Behavior?

The present study investigates how combined information from audition and vision impacts group-level behavior. We consider a modification to the original Vicsek model that allows individuals to use auditory and visual sensing modalities to gather information from neighbors in order to update their heading directions. Moreover, in this model, the information from visual and auditory cues can be weighed differently. In a simulation study, we examine the sensitivity of the emergent group-level behavior to the weights that are assigned to each sense modality in this weighted composite model. Our findings suggest combining sensory cues may play an important role in the collective behavior and results from the composite model indicate that the group-level features from pure audition predominate.


INTRODUCTION
Collective behavior in animal groups refers to the formation of group-level patterns from local interactions. Specifically, each individual in the group acts based on the information they receive by interacting with local neighbors. As a result, coordinated motion emerges in the absence of any centralized control. Collective behavior is commonly observed across biological systems, for example, ant colonies [1], fish schools [2,3], mosquito swarms [4], and bird flocks [5]. A key benefit of living in a group is access to higher level of information, which helps social animals to locate food sources [6], avoid predators [7], and find mates [8].
To study the mechanism that produces group-level patterns from local interactions, a variety of models have been proposed in the literature that simulates group coordination [9][10][11][12][13]. The mathematical modeling of collective behavior involves different approaches: modeling the system as a continuous medium [9] or continuous-time [11] or modeling it as a collection of agents interacting in discrete-time [14]. A popular agent-based model to study collective behavior is the Vicsek model [15] that assumes behavioral rules at the individual level. Precisely, in the original Vicsek model, each individual moves with a constant speed within a two-dimensional confined space and aligns itself in the average direction of its neighbors, along with its free will modeled as an intrinsic noise. Beside the intrinsic noise, one can consider the extrinsic noise, which is used to model errors resulting from incorrect assumptions about the environment or others' information [16]. The neighbors in the original Vicsek model are individuals residing inside a circular sensing region centered at the given individual's current position. The simulation results show a phase transition from a random disordered state to an ordered state as the number of individuals or the noise strength is varied. The effect of intrinsic and extrinsic noise on phase transition has been studied in [16]. The results in [16] indicate that the phase transition due to intrinsic noise is continuous, but is discontinuous in the presence of extrinsic noise. The simplicity of the original Vicsek model has been adopted as the starting point, and it has prompted the study of many variants that intend to include biologically relevant features, for example, consideration of both attractive and repulsive interactions [17,18] and a generalization to threedimensions [19]. Moreover, considering the range of interactions for real-world biological swarms, as they may not have an omnidirectional view, restrictions are imposed in the field of vision from a circular disk to a sector [20][21][22].
Most collective behavior models implicitly assume that individuals communicate using a sensing modality analogous to vision [20,21,23]. However, some social animals make use of auditory cues for communication, such as bats and dolphins [24]. There is limited consideration in the literature for models implementing audition-based interaction, for instance, a model inspired by acoustic sensing in midges [25] and echolocation in bats [26]. A study in [27] employs auditory sensing within a modified Vicsek model to examine the differences in behavior at the group-level by comparing simulation results with a model that uses purely visual sensing. The auditory sensory system is modeled as a sector of a circle [27] using a well-characterized directivity pattern observed in the formation of ultrasonic beams and inspired by biological systems [28]. Auditory and visual modes within the model of [27] are defined differently in terms of neighbors and result contrasts the auditory mode with the visual mode in terms of higher alignment and lower aggregation.
Even though efforts have been made to study group behavior in the presence of individual sensing cues, only a limited model exists in the literature that allows one to consider the fusion of stimulus information resulting from multiple sensing cues. On the other hand, evidence from the real world studies in [28][29][30] suggests that bats communicate and navigate using multimodal sensing, including audition, vision, somatosensory and vestibular perception, and chemoreception. Furthermore, being equipped with vision and audition, bats gather complementary information; for example, vision helps detect long-range objects, while audition helps detect small ones with great accuracy [30]. More examples of behavioral studies of bats reporting the multisensory integration of information can be found in [30][31][32][33][34].
Multisensory integration has several advantages; for example, previous research shows that it reduces reaction times [35]. A study in [31] shows bats benefit from multimodal sensing since they do worse on avoiding obstacles when the ambient light is reduced. Moreover, the empirical evidence in [29] shows that the bats alter their flight even before they hear their neighbors' echoes, showing how vision affects their flight behavior. Several studies in [32,33,36] show that bats continuously use two sensory modalities to find prey. According to [37], the high visual acuity and angular resolution of megabats make vision their preferred mode of navigation. The study on Egyptian fruit bats in [38] shows that bats increase the rate and intensity of their echolocation at low light levels when the bats' visual abilities are limited, suggesting that at times vision influences echolocation. All this empirical evidence suggests that bats utilize different sensory modalities as they benefit from multisensory integration to compensate for information that is not accessible via a single modality. Therefore, depending on the task, information from different sensory modalities may be weighed differently.
It is, therefore, crucial to improve our understanding of how multimodal sensing positively impacts group behavior. To accomplish this, a graph-theoretic approach utilizing consensus and synchronization protocols is proposed in [39,40] to analyze the impact of more than one sensing modalities. However, the approach in [39,40] ignores the spatial distribution of individuals. In the present work, we introduce a composite model that allows information from auditory and visual sensing cues to be weighed differently within a two-dimensional Vicsek model for the first time. We conduct simulations to understand how the relative strength of these sensory cues influences group behavior, measured in terms of three different order parameters.

MODELING WEIGHTED AUDITORY AND VISUAL SENSING MODALITIES
This section describes the novel implementation of the weighted auditory and visual sensing modalities within a modified Vicsek model and defines the order parameters that capture the behavior at the group-level.

Original Vicsek Model
The original Vicsek model is comprised of N number of particles moving in a two-dimensional square box of size L × L with periodic boundary conditions and average particle density is given by ρ N/L 2 . All the particles update their positions and heading directions in discrete-time. The initial positions and velocity directions of the N particles are randomly chosen within the square box and in the range [0, 2π], respectively, with uniform distributions. For an ith particle at time step k, the vectors x k i ∈ R 2 and v k i ∈ R 2 , i ∈ 1, . . . , N denote the position and unit velocity, respectively. The position of the ith particle at time step k + 1 updates as where v 0 denotes the speed of the particles, and is assumed to be constant for all particles and for all time. To update heading direction, at every time step, each particle is subjected to short range interaction and assumes the average heading direction of itself and its neighbors with an error term characterized by a random noise. In the original Vicsek model, the neighbors for a given particle are defined as the other particles which are located within a circular region of radius r around it. This circular region is similar to a sensing region, which limits within which a particle can sense the presence of other particles. The interaction of a particle i with its neighbors at time step k + 1 is modeled in terms of heading angle θ k+1 i , which updates as where Λ k i denotes the index set of neighbors of particle i that includes itself, and Δθ k i denotes a random variable uniformly distributed in the interval [ − η/2, η/2], where η is the amplitude of noise intensity. The Vicsek model has the assumption of periodic boundary condition, applied to mitigate the effect of edge. This means that, for a 2D system, the individuals move on a torus. Periodic boundary condition helps to minimize finite-size effects due to finite boundaries while using numerical simulations. Another assumption of the Vicsek model is that the speed of the individuals is constant. However, it is allowed for the velocities of individuals to differ as the heading directions are updated according to the interaction component and noise term following Eq. 2.

Modified Vicsek Model
Here we consider a modified version of the original Vicsek model where the particles use two distinct sensing modalities inspired from audition and vision to interact with other particles in the system. Moreover, we introduce a weighted update protocol, where the particles can ascribe different interaction strengths on these two sensing modalities. Below we describe the implementation of auditory and visual sensing modalities and the weighted update protocol.
The auditory and visual sensing modalities are incorporated similar to the work in [27]. In the visual sensing mode, each particle is assumed to have a sensing region similar to a field of vision, represented as a sector enclosed by two radii of length r and a central angle 2ϕ, where ϕ denotes the sensing angle. The sensing angle can vary from 0 to π, and is assumed to be symmetric about the individual's current heading direction. The visual neighbors of the ith particle at time step k denoted as v Λ k i , are defined as the particles that reside within its field of vision. In the auditory sensing mode, each particle is assumed to have an acoustic beam, represented as a sector of a circle. The acoustic beam is modeled similar to the field of vision and has a central angle 2ϕ, where the sensing angle ϕ can vary from 0 to π. The auditory neighbors of the ith particle at time step k denoted as a Λ k i , are defined as the particles, whose beams are occupied by the ith particle. Thus the particle i resides within the acoustic coverage of its auditory neighbors and can "hear" from them. Note that when the sensing angle ϕ π, the interaction neighborhood becomes a circle, and neighbor sensing is identical for both visual and auditory modes.
In the modified Vicsek model, we keep the position update protocol same as that in Eq. 1 and introduce a weighted protocol, different from Eq. 2, to update a particle's heading direction which allows the information from the above two sensory modalities to weigh differently. In this weighted protocol, the heading angle of a particle i at time step k + 1 updates as where α v and α a are coupling parameters representing the interaction weights ascribed to the visual and auditory sensing mode, respectively, and can have values between zero and one, such that α v 1 − α a . A schematic is shown in Figure 1 to describe the auditory and visual sensing schemes and an example have been used to demonstrate the weighted update protocol for the modified Vicsek model. Note that similar to the original Vicsek model (as in Eq. 2), we retain the stochastic component (Δθ k i ) in our present composite model. This term refers to an error or an individual's free will in deciding on its new direction. The update protocol in Eq. 3 is a generalization of our previous work in [27] which solely focuses on pure vision (α v 1) and pure audition (α a 1). Moreover, using the protocol in Eq. 3, we can consider equal contribution of audition and vision by choosing equal coupling weights (α v α a 0.5), as well as vary the relative interaction strength by choosing unequal weights (α v ≠ α a ).
A preliminary study on the simultaneous use of visual and auditory sensing is previously considered in [41], which introduces a composite model that combines the sensory neighbors from pure vision and pure audition, and the update protocol was kept similar to Eq. 2 as in original Vicsek model. In other words, the interaction weight is kept constant of value one, irrespective of audition and vision. This leads to an important distinction between our present weighted composite model with that of the earlier composite model in [41]. Specifically, to consider the equal contribution from audition and vision in the present model, we set the coupling weights to be 0.5, and thus, only if a particle j qualifies as both auditory and visual neighbor, then the total weight ascribed to its heading angle is one. This is because in such a scenario, both v Λ k i and a Λ k i include the particle j in Eq. 3. On the other hand, for the composite model in [41], the coupling weight is always one, regardless of the neighbors, be they auditory, visual, or both simultaneously.

Order Parameters
To characterize the collective behavior that emerges from the above protocols in Eqs 1, 3, we consider three observables, also termed as order parameters. The first observable is polarization, which is a measure of group alignment, defined at each time step k as the absolute value of the average normalized velocity, calculated as where · denotes vector norm. Polarization is a scalar quantity and can range between zero and one. Note that, in the perfectly ordered state, all the particles achieve a common heading direction and thus, P k 1, whereas in the completely disordered state particles moves in random direction and thus P k value is close to zero.
To investigate the spatial distribution of the particles with respect to the overall center of mass, we consider the second order parameter called cohesion. To compute cohesion, we first find the group's center of mass calculated as X k 1/N N i 1 x k i , and then calculate the relative position of each particle with respect to the x k i − X k . Next, we define cohesion at each time step k as where the scaling coefficient, l a 4r, consistent with the study in [11]. Similar to polarization, cohesion can have values between zero and one, where cohesion value one corresponds to all particles congregating at the center of mass, while small values indicate that all particles are scattered far from the center of mass. Note that as we perform the simulations considering a finite arena, and with the assumption of the periodic boundary conditions, it is not possible for the system to attain cohesion value zero. Finally, the third observable we consider is the largest cluster size of system. A collection of particles belong to a cluster when each particle in the cluster is connected to every other particle through a series of undirected edges that are drawn between each particle and its neighbors. If there are multiple such clusters at a given time, we only consider the largest cluster size, S k , which corresponds to the total number of particles present in it. Therefore, when all particles are in the same cluster, the size of the largest cluster equals the size of the system, and that is what we select as the order parameter for the system.

SIMULATION RESULTS AND DISCUSSION
Next, we conduct numerical simulations to study the grouplevel behaviors in terms of the order parameters defined above. For our simulations, we choose the length of the square box L 10, the constant speed v 0 0.03, and the radius of the sensing sectors for both audition and vision r 1. We further set ρ 10 and vary the control parameters noise intensity η, sensing angle ϕ, and the coupling parameter α a . Note that changing α a automatically changes α v following the relation α v 1 − α a . The main focus of our present work is to study the differences in the group-level behavior as we vary the relative interaction strengths between audition and vision. Accordingly, we assume that the two sensing sectors corresponding to vision and audition are geometrically similar so that the observed differences in group behaviors are due to relative coupling weights rather than the differences in sensing regions. In other words, for a given simulation, we keep r and ϕ of audition same as that of r and ϕ of vision. Despite the fact that the sensing sectors for audition and vision are geometrically similar, the set of auditory neighbors differ from visual neighbors based on their definitions. Finally, the particles' initial positions and heading directions are randomly generated within the square box and the unit circle, respectively, with uniform distributions. We generate the initial conditions once and keep them identical for all our simulations. We run each of our simulations for 50,000 timesteps and record positions and velocities of all the particles after excluding the initial transient of 10,000 time-steps. Next, we compute the mean polarization, mean cohesion, and mean largest cluster size, averaged over the remaining 40,000 timesteps. Figure 2 presents the results for the mean polarization at different values of α a . Within each sub-figure (A to J), η (noise intensity) is varied along the vertical axis from 0.1 to 1 with a small increment of 0.1 and then from 1 to 5 with an increment of 1, and ϕ is increased along horizontal axis from 0 to π with a constant increment of π/15. To study the dependence of mean polarization on the coupling parameter, we choose a range of FIGURE 1 | (A) The schematic shows the auditory and visual sensing schemes for a target individual (white circle). The "visual neighbors" are the individuals that the target "can see" and are marked using cross symbols. The "auditory neighbors" are the individuals that the target "can hear from" and are marked with plus symbols. In the present model, the auditory and visual sensing sectors are assumed to be geometrically similar having same r and ϕ for a given simulation. (B) Shows the vectorial addition of the heading directions from the two independent sensing modes, vision (top) and audition (bottom). Note that the heading direction (dashed orange vector) of the target appears in the vectorial addition of both the modes, allowing the total contribution from itself to be one. The resultant vector from vision (green solid vector) and audition (blue solid vector) averages the heading direction of visual and auditory neighbors, respectively. Finally (C) shows an example heading direction update in the next time step using the weighted protocol, with α a 2/3 and α v 1/3. Thus, the heading direction of the individual is computed by scaling the resultant vector of vision by one-third (one-third length of the green solid vector in B) and the resultant vector of audition by two-thirds (two-thirds length of the blue solid vector in B), followed by the vectorial addition. The solid orange arrow is the updated heading direction of the target at the next time step.  Figures 3, 4 show the results for mean cohesion and mean largest cluster size, respectively. Observing each sub-figure in Figure 2, we identify that polarization is zero when ϕ 0, and polarization increases with increasing ϕ as prominently observed in Figure 2A with α a 0. Note that when ϕ 0, particles do not have a sensing region, and thus they do not interact, resulting in a random walk with polarization value zero. Increasing ϕ results in increased interactions which benefits the group to achieve group alignment. In presence of interaction (i.e., ϕ > 0), we observe a trend of decreasing polarization with increasing η at a constant ϕ, for example in Figure 2C at ϕ π/15. This phenomenon, commonly observed in the Vicsek model, shows that increasing the amplitude of noise intensity destroys the group alignment of the system.
Next, by comparing pure vision in Figure 2A with α a 0 and pure audition in Figure 2J with α a 1, we identify a disparate behavior, similar to the study in [27]. In particular, we observe that in the pure vision, polarization is relatively small at small sensing angles ϕ ≤ 6π/15, whereas, in the pure audition, the particles achieve a polarization of one for ϕ > 3π/ 15. To understand the small values of polarization at small sensing angles in pure vision, we look at the corresponding values of mean cohesion and mean largest cluster size in Figures 3, 4, respectively. For example, for ϕ ≤ 6π/15, Figure 3A shows that the cohesion values are high, whereas Figure 4A shows that the size of the largest cluster is small. These results suggest that in pure vision at small sensing angles, the particles form multiple clusters, thus decreasing the size of the largest cluster. Furthermore, the particles within these individual clusters are closely spaced, which increases cohesion values; however, all these clusters do not necessarily move in the same direction, thus decreasing the polarization values. On the other hand, in pure audition, even for small sensing angles (3π/15 ≤ ϕ ≤ 6π/15), we observe high values for both polarization and largest cluster size but small values for cohesion. This is because all the particles group into one large cluster with the same heading direction and thus achieves polarization value one, but the particles within the cluster are loosely packed and evenly distributed in space, thus decreasing cohesion. To explore the group-level behavior in composite sensing of weighted audition and vision, we start with polarization. The modified Vicsek model introduced in this work allows us to observe the changes in the group-level features as we transition from pure vision to pure audition by increasing the tuning parameter α a from zero to one. Thus, we start with very weak auditory coupling but strong visual coupling. Interestingly, Figure 2B (α a 0.01) and Figure 2C (α a 0.05) show that even in the presence of very weak auditory coupling, the polarization values at small sensing angles increase compared to pure vision as in Figure 2A. As we keep increasing α a , we observe that the polarization values at small sensing angles (ϕ ≤ 3π/15) keep increasing from Figure 2D (α a 0.1) to Figure 2G (α a 0.25). The polarization reaches a maximum in Figure 2H when α a 0.5, which corresponds to equal contribution from audition and vision. This can be observed by comparing polarization values between Figure 2G (mean polarization is 0.920), 2H (mean polarization is 0.936), and 2I (mean polarization is 0.924), at ϕ π/15, and η 0.9. From these observations, we summarize that the system can achieve a perfect group alignment, even using a narrow sensing region, by combining information from audition and vision as in the weighted composite model. Furthermore, the system gets the maximum benefit when the contribution from the individual modes are equal.
Next, we explore group-level behavior in terms of cohesion and the largest cluster size in the weighted composite model.  Figure 3 shows a decrease in cohesion values at small sensing angles with increasing α a , and cohesion is observed to be minimum when α a 0.5. Similarly, Figure 4 shows an increase in the largest cluster size at small sensing angles with increasing α a , and the maximum value is observed at α a 0.5. Moreover, the differences are only observed in the three order parameters starting from α a 0.2 to α a 1 when ϕ is at its smallest value of π/15, whereas results are identical at other values of ϕ. These results show that once the auditory sensing is introduced in the system, even in weak form, the traits from pure audition dominates the group-level features.

CONCLUSION AND FUTURE WORK
In the present work, we take inspiration from the observations of real bats that integrate visual and auditory information for effective navigation and introduce a modified Vicsek model that uses a weighted scheme to update individual heading directions. Specifically, the update scheme allows the information from both visual and auditory cues to weigh differently. Next, we conduct simulations to study the effect of relative weights ascribed to each sensing modality on emergent group behavior. Finally, we measure the group behavior in terms of three order parameters, and results show that the group-level features from pure audition dominate the behavior. This study demonstrates that combining information from multiple sensory cues can play a significant role in collective behavior.
An improvement of the present study can be in terms of validating the model using empirical data of bats. However, there is limited availability of a large dataset of bats, as tracking individuals in a large group for a long time is an immensely challenging task. For example, an onboard microphone is needed to collect acoustic data of the bat's location, which is difficult given the bat's small size and lightweight [42]. Finally, the present study relies on the use of a "minimal" setup that can produce a cohesive moving group, which the Vicsek model achieves, in order to examine the effect of simultaneous use of audition and vision on the group level behavior. Accordingly our model is built on the same set of assumptions as that of the original Vicsek model, which limits its ability in its current form to incorporate some real-world features. For example, the speed of the individuals are assumed to be constant for all time. Moreover, the number of individuals in conserved but the momentum is not conserved under the alignment interactions [43]. In addition, a two-dimensional model is considered in the present study as the first effort to implement composite sensing cues. We will relax some of these assumptions in future study and will consider biologically relevant and geometrically different parameters to model the auditory and visual sensing sectors, explore a generalized three-dimensional Vicsek model with composite sensing modalities, and consider the extrinsic (measurement) noise which will make the model more realistic.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.