Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness

Upon a matrix representation of a binary bipartite network, via the permutation invariance, a coupling geometry is computed to approximate the minimum energy macrostate of a network's system. Such a macrostate is supposed to constitute the intrinsic structures of the system, so that the coupling geometry should be taken as information contents, or even the nonparametric minimum sufficient statistics of the network data. Then pertinent null and alternative hypotheses, such as nestedness, are to be formulated according to the macrostate. That is, any efficient testing statistic needs to be a function of this coupling geometry. These conceptual architectures and mechanisms are by and large still missing in community ecology literature, and rendered misconceptions prevalent in this research area. Here the algorithmically computed coupling geometry is shown consisting of deterministic multiscale block patterns, which are framed by two marginal ultrametric trees on row and column axes, and stochastic uniform randomness within each block found on the finest scale. Functionally a series of increasingly larger ensembles of matrix mimicries is derived by conforming to the multiscale block configurations. Here matrix mimicking is meant to be subject to constraints of row and column sums sequences. Based on such a series of ensembles, a profile of distributions becomes a natural device for checking the validity of testing statistics or structural indexes. An energy based index is used for testing whether network data indeed contains structural geometry. A new version block-based nestedness index is also proposed. Its validity is checked and compared with the existing ones. A computing paradigm, called Data Mechanics, and its application on one real data network are illustrated throughout the developments and discussions in this paper.


Introduction
Ever since the assembly rules proposed in Diamond (1975) and Case and Sidell (1983), the presence-absence matrix has been the fundamental data type in Community Ecology. A presence-absence matrix is also called co-occurrence matrix. In fact, from data structural perspective, as being permutation invariant on both axes of matrix, this kind of data type should be precisely termed binary bipartite network. Such network data is now a major data type for understanding mutualistic system interactions in wide ranges of ecological studies (Bascompte et al. (2003)). A ecological mutualistic system concerns mutually beneficial interactions between a collection of animal species and another collection of plant species. One typical example is the flowering plants and their insect pollinators. The binary bipartite network records presenceabsence of a target interaction upon each animal-vs-plant entry. In other words, a binary bipartite network is used to approximate an ecological system from mutualistic perspective. In contrast, the directed binary network is used lacking proper algorithms for mimicking and generating matrices with distinct structural information. Above all challenging tasks of defining an effective testing statistics on matrix or network data have not been systematically resolved. Only heuristic and parsimonious solutions are suggested so far in literature.
All these aforementioned issues are systematically discussed, developed and resolved in this paper. Throughout this paper a binary bipartite network and its approximating mutualistic system are the primary concerns. Therefore the network's rectangle matrix representation has one axis for a collection of animals of interest and the other for a collection of plants under study. We first discuss computational developments for what visible geometric patterns are indeed embedded within the matrix, and then discuss whether such embedded geometric structures are coherent with the idea of nestedness. The first part of discussion resolves the issues arising from co-occurrence matrices observed in biogeographic systems.
By making use of the fact that a binary bipartite network is permutation invariant with respect to nodes on both axes, a new computing paradigm, called Data Mechanics, is applied to extract a combination of a deterministic multiscale structures and a stochastic uniformity from such data (Fushing and Chen (2014)). The coupling of deterministic and stochastic structures is termed a coupling geometry. This resultant coupling geometry is taken as the computable information contents of the network data because it is very close to the minimum energy macrostates of the target system. From statistical physics perspective, all microstates are to conform to such macrostates. Such a conformation implies a principle of how to mimic an observed network data (Fushing and Chen (2014)). Specifically the deterministic multiscale structures are the visible patterns contained in the data, which are what have been missing in (Connor and Simberloff (1978)), while the uniformity enables us to mimic and to generate various ensembles of matrices with different geometric pattern information.
Another major concept proposed in this paper is that the conceptual nestedness on a data matrix has to be adapted upon the computed deterministic multiscale structures. This adaptation is meant to build the least nestednessbearing construct containing observed deterministic multiscale structures. As such testing the hypothesis of whether a data matrix embedding with geometry of nestedness is to evaluate the degree of structural differences between this nestedness-bearing construct and the original coupling geometry's deterministic structures. Based on this concept, we propose a block based nestedness index and compare it with three existing popular indexes. Among these three indexes, one is originally proposed in Patterson and Atmar (1986) and the other two are the improved versions (Almeida-Neto et al. (2008); Atmar and Patterson (1993)). Ironically we found that these two improved versions are indeed improper. Throughout this paper we use the wellstudied Mammal data in (Patterson and Atmar (1986)) for illustrating and expositional purposes.

From intuitive grouping ideas to coupling geometry
Within an ecological system, all intrinsic patterns of the mutualistic interactions between a collection of animal species and another collection of plant species, beyond individual-to-individual level, are the supposed information contents to be contained within the observed binary bipartite network. Such pattern information intuitively would be jointly expressed through clusters of similar animals coupling with clusters of similar plants in a fashion of block-wise uniformity. That is, on multiple global levels, dissimilar clusters on one axis would reveal contrasting configurations of clusters on the other axis. As such, scientists can visualize why different clusters of animal are characterized distinctively with respect to differences among clusters of plant. That is, the information contents within a binary bipartite network data are multiscale and visible, more importantly they are computable.
Hierarchical clustering can somehow capture and visualize block-wise clustering of a matrix, but it tends to produce clusters with imbalanced sizes and each block lacks uniformity (Johnson (1967)). Recently such multiscale information patterns are computed through a new computing paradigm, called Data Mechanics, developed in Fushing and Chen (2014). Computationally, Data Mechanics indeed attempts to solve an optimal permutation problem of achieving the minimum total variation among all possible matrix-representations of the observed bipartite network. Here the total variation is defined with respect to a choice of neighborhood system, such as the set of immediate neighbors on the rectangle matrix lattice. A version of total variation with detailed formula is given in Supplementary Section A. This discrete combinatorial optimization is operated based on the permutation invariance of a bipartite network with respect to its nodes of animals and plants. The complexity of this problem surely depends on the exponentially growing factorials of sizes of the animal and plan collections. Though the concept of pattern information contents contained on a binary bipartite network is intuitive, the computing for the multiscale structures can be a rather complex problem. Data Mechanics is designed to provide optimal or nearly optimal solutions to this computational problem.
The algorithm for computing ultrametric trees, a key part of Data Mechanics, is called Data Cloud Geometry (DCG). Developed in Fushing and McAssey (2010), DCG is aimed to construct ultrametric trees via on multiscale clustering, which has been widely used in many fields (Balasubramaniam et al.  [Patterson and Atmar (1986)]. Interactions are in black and non-interaction are in yellow. 2017a); Li and Tam (1998)). With the iterative computing of DCG on row and column axes, Data Mechanics converts unstructured binary biparite networks into multiscale block patterns framed by two ultrametric trees iteratively built upon the two axes, respectively.
The stochastic structures are found within each block formed by a core cluster on row axis and one core cluster on the column axis. Such two-dimensional uniform randomness is subject to row and column sums sequences of the involving block. Here core clusters of an Ultrametric tree are identified on its bottom tree level. That is, the finest scale structure of a coupling geometry is referring to block patterns formed via core clusters, while the coarsest structure referring to the one framed by one cluster containing all animals and one cluster containing all plants. The scales between these two extremes are specified by tree levels between the top and the bottom one.
Thus, by designing all resultant optimal and nearly optimal solutions, we illustrate multiscale block patterns through the coupled framework of two ultrametric trees built on animal and plants axes in Fig.1

(a) and (b).
It is clear that the data-driven deterministic multiscale block structures brought out by multiple tree levels of two Ultrametric trees frame and summarize the interacting relational patterns between animals and plants. The coupling relation of these two trees in fact is derived iteratively and alternatively by applying a computing algorithm, called Data Cloud Geometry, which serves as the key device of the Data Mechanics. The iterative procedure is designed to update a distance measure used in the previous iteration by taking the currently computed tree structural information into considerations, while the pro- cedure of alternating between animal and plant axes is designed to build the dependence or coupling of the two trees. It is noted that throughout this paper a computed coupling geometry (with energy -2184), not the actually lowest energy matrix configurations, is employed as the foundation of all developments. The reason for the parsimonious approach is purely for computational effectiveness. For instance, one lowest energy matrices (with energy -2204) of the Mammal data are found in Fig.2.
It is typically that the computed coupling geometry is pretty close to the solution of lowest energy one. And it is the right starting point of searching endeavors of finding the optimal solution. However it should be noted that it usually takes a huge amount of computing efforts in order to achieve the optimal goal.
Beside the deterministic multiscale block structures, Data Mechanics computations also bring out block-wise stochastic randomness. This stochastic component is specifically seen as the uniformity within each block found on the finest scale.
2.2. From coupling geometry to block-based testing statistics for structural hypothesis Here we construct a reasonable and effective testing statistic in regarding to nestedness as a hypothesized geometric structure upon an observed binary matrix, or a binary bipartite network. Due to the fact that the coupling geometry is very close to the minimum energy macrostate of the system approximated by the data network, it is necessary to treat such a coupling geometry as the minimum sufficient statistic. It is a fundamental principle in statistical thinking that an efficient testing statistic should be based on the computed coupling geometry as the data's minimum sufficient statistic. Therefore the most relevant geometric structure of nestedness must be its least version that contains the coupling geometry. So theoretically finding the least containment is an optimization problem.
Let N G denote the least nestedness geometric structure defined on the same matrix lattice as that of the originally observed data matrix. However there is no need to explicitly compute it because of the multiscale block patterns of the computed coupling geometry. Thus, we need only to evaluate the functional characteristics of N G in terms of all involving blocks, which are found on the finest scale upon a coupling geometry, as shown in Fig. 1 and 2. That is, λ(B (N G ) ij ) , the intensity of 1's in block B ij , has to satisfy the following two properties to be in accord with nestedness: 1) 1st order property: {λ(B (N G ) ij )} is decreasing with respect to given all j's, and at the same time, increasing with respect to given all i's.
2) 2nd order property: has at most one sign-change from positive (+) to negative (−), that is, being concave down-ward to concave-upward; while {∇λ(B (N G ) ij |R)}, 2nd order differences on jth column: has at most one sign-change from negative (−) to positive (+), that is, being concave up-ward to concavedownward.
Another important property of the second order is that sequences of {sign(∇λ(B (N G ) ij |C))} and {sign(∇λ(B (N G ) ij |R))} contain the corresponding sequences of signs pertaining to the coupling geometry, denoted as C G . On the Mammal data, the 5 × 5 matrix [λ(B (N G ) ik )] of block-wise intensities of the coupling geometry is calculated as: Based on the above block-based nestedness perspective, the following nestedness-index for a simulated matrix, denoted by S , is proposed: The first two terms on the right hand sides of index N CG are costs against the linear ordering along the columnindex on every row, and along the row-index on every column. The product terms are designed to be negative in values if the linear ordering holds, and positive if the linear ordering fails. So the larger N CG value is, the farther away from the nestedness. The 3rd and 4th terms are counting the coherence of 2nd order differences with that of N G . Positive and larger values indicated incoherence or violations of nestedness.

From coupling geometry to matrix mimicking
The primary use of a computed coupling geometry from a binary network is to make possible for generating a series of ensembles of matrix or network mimicries bearing  with decreasing degrees of geometric structural information from the finest to the coarsest scales. The matrix ensemble pertaining to the finest scale of structural information is generated by patching up all simulated blocks, which are marked by core clusters of Ultrametric trees of animal and plants, subject to block-version row and column sums sequences. This is an ensemble that conforms to the minimum energy macrostate of the ecological system from statistical physics perspective. While the ensemble pertaining to the coarsest scale of structures is simply referring to the collection of matrices satisfying the constraints of row and column sums sequences of the observed entire matrix as a block. The generative algorithm employed here is the one proposed and used in Miller et al. (2013). A brief illustrating example and summary of this algorithm are given in the Supplementary Section B. But it is worth noting that this algorithm is effective for small sizes of binary data matrix, such as the Mammal data. It breaks down even on the 50 × 50 matrix. The key factor affecting the performance of the algorithm is the matrix's sparsity of 1's. In ecological literature, 2 × 2 checker-board switching and its improved version, curveball algorithm (Strona et al. (2014)), are also popularly used to generate binary matrices with constraints of row and column sums sequences. Basically the 2 × 2 checker-board switching and its variants are searching for new solutions by going away from an existing one. In contrast, Miller and Harrisons algorithm intrinsically simultaneously solving the linear equations imposed by the constraints of row and column sums sequences. Thus these two matrix generating algorithms are rather distinct in nature. And both types of algorithms suffer distinct drawbacks to be applicable widely.
The drawbacks of 2 × 2 checker-board switching and its variants are: first, they generate dependent matrices depending on the initial matrix; secondly, their energy spreads are relatively too narrow, indicating that they have preference for previously sampled matrix configurations. One evident view of such drawbacks is revealed in Fig.3. Further our computer experiments show that the generating processes have rather short recurrent time cycles, that is, repeated matrices being generated too often. This phenomenon indicates that the generated trajectory might have been confined within a small region.
Here we tentatively propose a practical way of resolving the issue of large data matrix that currently limiting Miller and Harrisons algorithm. By incorporating with a randomized divide-and-conquer sampling scheme on the observed data matrix, the whole matrix is divided into blocks, on which Miller and Harrisons algorithm becomes applicable. This sampling scheme can be made to accommodate heterogeneity brought out the coupling geometry on both axes.

From matrix ensembles to energy profile
The entropies of this series of ensembles are defined as the logarithm of their sizes. For the Mammal data in Fig. 1 (b), the serial sizes of ensembles are computed via an algorithm from Miller et al. (2013) as follows: the size of the finest scale (E 5×5 version) ensemble is 1.3 × 10 8 , the E 4×2 version is 4.47 × 10 16 , the E 2×2 version is 1.45 × 10 29 and the E 1×1 version (the coarsest scale one) is 2.7 × 10 39 . Such quantities of ensemble size or entropy bring out the quantitative sense of structural differences among multiscale block geometries embedded within the originally observed network.
Another important aspect of such differences is revealed via a profile of energy distributions, as shown in Fig. 4 on the Mammal data. It is evident that the two versions, E 5×5 and E 1×1 , of ensembles are very different in a sense that a randomly chosen matrix from E 1×1 ensemble would appear very different from any one fromE 5×5 ensemble. The shifting-to-right pattern of the energy distribution profile strongly implies that computable and then observable block patterns contained in the coupling geometry is persistently eroding. The nearly complete separation of the two energy distributions based ensembles E 5×5 and E 1×1 , respectively, indicates that the coupling geometry is not likely resulting from a random sampling. In this fashion, the hypothesis of co-occurrence patterns is tested if the original binary bipartite network is represented by a presence-absence data matrix. (See the Supplementary Section A for comparisons of energy index with other indexes of co-occurrence.)

From coupling geometry to block-based testing statistics for structural hypothesis
So far there are at least three nestedness indexes have been proposed in literature. They are N + counts (Patterson and Atmar (1986)), T(temperature)- (Atmar and Patterson (1993)) and NODF (Almeida-Neto et al. (2008)) indexes. The last two indexes are newly proposed and supposedly to improve the first index. However, as shown in the Fig. 5, these two supposedly improved versions are indeed improper. On the contrary, though it might not be effective, the originally proposed N + count is not unreasonable.
Three existing indexes and the index NCG are computed upon four ensembles: E 5×5 , E 4×2 , E 2×2 and E 1×1 , derived from the coupling geometry of Mammal data. Their corresponding distributions are presented in the following four panels. From the panel (a) for "N + ", though we see the distribution based on E 5×5 is somehow overlapping with the one based on E 1×1 , while their two modes are evidently separated. And their relative positions are correct with the distribution based on E 5×5 as being more toward the nestedness and on the left of the one based E 1×1 . In contrast, via panel (b) and (c) for T(temperature)and NODF indexes, all their distributions are nearly completely overlapping with each other. Such complete overlapping phenomena strongly indicate that both indexes T(temperature)-and NODF are not effective statistics for testing nestedness given that the two ensembles E 5×5 and E 1×1 are very different in energy and size, so in pattern information. Finally, on the panel (d), the profile of distributions of N CG based on E 5×5 , E 4×2 , E 2×2 and E 1×1 is progressively shifting to the right as being away from nestedness. It is noted that the singleton based on ensemble E 5×5 is located at the extreme left tail of the one based on E 1×1 . And it is understood that the 2nd order differences component in such an index is the key aspect that separate the coupling geometry and matrices in E 4×2 , E 2×2 away from E 1×1 .
Here we make a remark on the current state of knowledge: such an issue of how to systematically define an efficient testing statistic even on a binary bipartite network is still wide open at current state of knowledge. In fact it is expected because a bipartite network is indeed used to approximate a complex system state. This system state is very much nonparametric in nature in the sense of containing nonlinearity, dependence and heterogeneity.

From coupling geometry to formulating the structural hypothesis
Finally we discuss two issues: 1) what are the rationales behind comparing the two distributions based on ensembles E 5×5 and E 1×1 in the Mammal data? 2) Why a hypothesis based on a binary bipartite network has to be formulated and tested in a conditional setting? These two issues are indeed two sides of the network datas information contents.
First insight coming from the datas matrix representation is that the row and column sums sequences might well be deterministic. That is, the two sequences involve with no randomness at all. Since it is what the system is at the moment the data being collected. Therefore conditioning on the two sequences is not only preferable, but necessary.
Via the conditioning thinking, the null and alternative hypotheses: Ho and Ha, are to be formulated as: Ho: The mutualistic system of animal and plant interactions doesn't contain nestedness related patterns beyond the minimum patterns sustained by the row and column sums sequences.
Ha: The mutualistic system of animal and plant interactions does contain more nestedness related patterns in its minimum energy macrostate than the minimum patterns sustained by the row and column sums sequences.
Under Ho for the Mammal data, the null distribution with respect to any testing statistic is exactly the corresponding distribution derived from the ensemble based on E 1×1 . However, under Ha, typically there exists one or  many lowest energy macrostate embedded within an observed binary bipartite network data. For instance, we can find more than 300 matrices with lowest energy for the Mammal data. However explicitly finding minimum energy macrostates can be impractical due to huge computational loads. This is usually the case for large data networks. Hence a computable coupling geometry is the pragmatic candidate, on one hand. On the other hand, since the hypothetic geometry specified on the alternative has to contain the minimum energy macrosate, so that an efficient testing statistic has a function of the coupling geometry as a minimum sufficient statistics. That is why the testing statistics is defined based on the finest scale block patterns. Such testing statistics are nearly identical with the ones defined on the real minimum energy microstates. Therefore the distribution under Ha pertaining to any efficient testing statistics has to be singleton based on the finest scale blocks, such as E 5×5 in Mammal data.
Nonetheless, it is critically important to emphasize here that any non-efficient statistics will have many observed values under the alternative hypothesis, such as index on the Mammal data on ensemble E 5×5 . So there would be a distribution of p-values. That is, the report of one single p-value is not valid in such a hypothesis testing setting. We indeed need to report such a distribution of P-values.

Discussion
We computationally extract a coupling geometry, consisting of deterministic and stochastic structures, embedded within an observed binary bipartite network as its information contents. From physical perspective, it is a minimum energy macrostate, and at the same time, from statistic perspective, it is the minimum sufficient statistic. Therefore any microstates as mimicries of the observed data network have to conform to this coupling geometry, while any potentially efficient testing statistics have to be a function of it. Further the pertinent geometric structure of nestedness has to be the least construct containing such a coupling geometry. These are fundamental facts underlying any coherent data analysis on binary bipartite networks. Significant implications include that the formulations of hypotheses are needed to be based on the minimum energy macrostate. And any potential nestedness index has to be in a form based on block patterns found on the finest scale.
The computable coupling geometry also facilitates various ensembles of matrix-mimicking according to its multiscale block patterns. Because of the profile of ensembles bearing with monotonically less geometric structures, any reasonably effective nestedness index will give rise to gradually separating index-based distributions: from the finest scale to the coarsest scale. That is to say that an index is not effective if it misses such a gradually separating pattern. When an index, such as N CG proposed here, is defined based on the finest scale blocks, it gives rise to singleton on the finest scale ensemble. Otherwise there would be a distribution of P-values.
On the front of generating random matrix subject to the two sequences of row and column sums, experiences from our computer experiments reveal that the commonly used 2 × 2 checkerboard swapping and its variants need big perturbations in order to achieve more uniform sampling. A perturbation-aided sampling scheme, based on a coupling geometry and Miller and Harrisons algorithm (Miller et al. (2013)) can generate and sample large random matrices up to thousand-by-thousand in size.
As a final remark, a coupling geometry computed from a binary bipartite network data can further afford an approach to compare the marginal tree structure on one axis with its corresponding phylogenetic tree as a new way of evaluating phylogenetic effects. Such a comparison of two trees structures can be performed via a technique called, partial coupling geometry, which is developed in the spirit of mutual information in information theory.