Who Is Where in Marine Food Webs? A Trait-Based Analysis of Network Positions

Networks of trophic interactions provide a lot of information on the functioning of marine ecosystems. Beyond feeding habits, three additional traits (mobility, size, and habitat) of various organisms can complement this trophic view. The combination of traits and food web positions are studied here on a large food web database. The aim is a better description and understanding of ecological roles of organisms and the identification of the most important keystone species. This may contribute to develop better ecological indicators (e.g., keystoneness) and help in the interpretation of food web models. We use food web data from the Ecopath with Ecosim (EwE) database for 92 aquatic ecosystems. We quantify the network position of organisms by 18 topological indices (measuring centrality, hierarchy, and redundancy) and consider their three, categorical traits (e.g., for mobility: sessile, drifter, limited mobility, and mobile). Relationships are revealed by multivariate analysis. We found that topological indices belong to six different categories and some of them nicely separate various trait categories. For example, benthic organisms are richly connected and mobile organisms occupy higher food web positions.


INTRODUCTION
In order to sustain the proper functioning of ecosystems, we need to better understand the simple question of Lawton (1994): What species do in ecosystems? Since ecological roles and food web positions are not independent (Luczkovich et al., 2003), we address the question what kind of species occupy certain kinds of network positions.
Since the very first attempts to identify keystone species (Paine, 1966(Paine, , 1969, there has been an interest in their place in food webs (Mills et al., 1993;Power et al., 1996). First they were suggested to have been top predators, then also plants, herbivores, and parasites (Bond, 1994;Marcogliese and Cone, 1997). For both community ecology and conservation biology, it would be very useful to know where are they in complex trophic networks.
While it is clear that the relative importance of organisms varies with time and space, looking at a large database may provide some general insight into the problem. If certain types of organisms occupy certain types of network positions, results can increase the predictability of food web modeling. Comparisons of centrality indices with each other (the similarity of DC and CC: Jordán et al., 2007; K predicts KSI better than s: Endrédi et al., 2018b) and centrality indices with trophic level (most high-centrality species at medium trophic levels: Scotti and Jordán, 2010) were done to better understand critically important positions of organisms in food webs. Extending this interest by adding trait data to trophic groups helps the biological interpretation of the results. Relationships between centrality indices have been studied for other network types as

Index name Description
Degree centrality (DC) Number of other nodes connected directly to the considered node (Wasserman and Faust, 1994).
Weighted degree centrality (wDC) Sum of weights of links adjacent to the considered node (Wasserman and Faust, 1994).
Betweenness centrality (BC) Frequency of the considered node on the shortest paths connecting all pairs of other nodes (Wasserman and Faust, 1994).
,i = j, k g jk is the number of equally shortest paths between nodes j and k, g jk (i) is the number of these shortest paths to which node i is incident in the length of the shortest path between nodes i and j in the network.
Closeness centrality (CC) Quantifies how short are the minimal paths from a given node to all others.
d ij is the length of the shortest path between nodes i and j in the network (Wasserman and Faust, 1994).
The topological importance of species i when effects "up to" n steps are considered is the sum of effects originated from species i up to n steps averaged over by the maximum number of steps considered (i.e., n): ji n a m,ji is the effect of j on i when i can be reached from j in n steps (Jordán et al., 2003). We analyzed indirect effects of maximum three steps (n = 3). WI i n is the same but with weighted links.
We can assess the overlap in the neighbors of two nodes quantifying the uniqueness or redundancy of nodes (Jordán et al., 2009;Lai et al., 2015), as a function of a t threshold for the TI n and the WI n matrices, providing TO and WO, respectively.
Status (s), contra-status (s ) and net status ( s) In a directed strong hierarchy, the status is the sum of ij distances from node i to every other node j. Reversing the hierarchy (reverting the direction of the links), the same calculation will give the contrastatus of each node (s i ) (Harary, 1959): Keystone index and its components (K, K bu , K td , K dir , and K indir ) The keystone index of a species i is defined as (Jordán et al., 1999): well, including habitat networks (Baranyi et al., 2011;Pereira et al., 2017). With large databases and new statistical analyses, these questions can be re-investigated and our knowledge can be updated. In this article, we consider a large database of trophic networks, described by standard methodology for both data collection and network construction, making them comparable. We (1) characterize the network position of each trophic component by a variety of topological indices, quantifying centrality, hierarchy, redundancy, keystoneness, and trophic level, (2) characterize each trophic component by three traits, and (3) use multivariate methods for comparisons between various topological indices and between topological indices and traits.

MATERIALS AND METHODS
Data from 92 Ecopath with Ecosim (EwE) aquatic food web models were compiled using the EcoBase online database repository (Colléter et al., 2013) and previously published sources (Heymans et al., 2014). These networks have varying number of nodes (ranging from 8 to 63) but were assembled using comparable methodology of the EwE framework (Christensen and Walters, 2004;Heymans et al., 2016). For models of the same ecosystem described in different years, we used the most recent one (considering the year of publication). The compiled data represent five global regions with diverse ecosystems: 14 models from Africa, 14 from Australasia, 29 from Europe, 27 from North America, and 8 from South America (Supplementary Material A).
The network position of each trophic component in each trophic network was characterized by 18 topological indices (see Table 1 for description of computed indices). Centrality was quantified by six indices (four binary and two weighted), we used eight indices for hierarchy (i.e., centrality in DAGs), two indices for redundancy (topological overlap), one for keystoneness (KSI, Libralato et al., 2006) and also the measure of trophic level as it is used in EwE. The last two indices were retrieved from previous publications (see Heymans et al., 2014 and the references in Supplementary Material A). All other topological indices were computed using programs UCINET (Borgatti et al., 2002) and CoSBiLab (Valentini and Jordán, 2010).
In order to be able to use a wide range of topological indices, some of them with specific requirements, it was necessary to pre-process the database in a few steps. This ensured the applicability of indices and the comparability of the results. Since we focus on the interactions among living organisms, we deleted (1) non-living network components (e.g., DOM) and (2) living components that became isolated nodes after deleting the nonliving ones (e.g., holothuroids in the Kuosheng Bay network). From an energetic point of view, detritus and cycling are clearly crucial to ecosystem dynamics, however, topological indices (who interacts with whom) may provide biased results and artifacts if non-living components are not deleted (e.g., detritus can simply be connected to each living component). We doublechecked if this data processing had a major effect on the KSI and TL index values and found the difference only minimal and safely negligible (TL was changed highly consistently across the networks, as almost the same trophic groups were removed from almost the same positions, while KSI-values still quantify nodes in the original networks but their re-calculation is not possible for the modified networks -from a comparative perspective, neither makes real difference). This process rendered one small network (Maspalomas Lagoon) without primary producers, thus not usable for our study. Altogether this resulted in the deletion of 150 network components (127 non-living and 23 living) (Supplementary Material A). On average, this means 1.63 node (6%) per network. One additional node, Stellar Sea Lion pup ("SSL pup") from the Aleutian Islands model was an outlier (due to asymmetric connections of only having one predator and no prey) and was omitted. Before computing nonhierarchical indices, networks were symmetrized by summing the interactions' strengths. All loops were eliminated from 57 food webs to be able to compute hierarchical indices (detailed methods can be found in Supplementary Material B). Functional groups were assigned to three categorical traits (i.e., feeding habitat, mobility, and size category, Table 2) and one continuous trait (maximum body size). In general, the trait for the foraging adult form was considered, unless age (e.g., juvenile) or size (e.g., small) was specifically noted. Specieslevel habitat preference and maximum length measurements (in cm) were extracted from the FishBase (FishBase, 2020) and SeaLifeBase (SeaLifeBase, 2020) online databases and assigned to larger functional groups.
Generalizations are inevitable where species are not listed or are aggregated into functional groups (common practice in food web studies). Below we describe the generalizations we encountered and the methods used for trait assignments. First it is noted that we needed to work with a small number of relatively large categories in order to keep the cross-ecosystem analysis feasible (more detailed classifications would reduce comparability). Several traits can be defined only for a smaller range of organisms, like "pigments" for phytoplankton (Weithoff and Beisner, 2019) or "dive duration" for the megafauna (Tavares et al., 2019). We tried to maintain the coverage of trait data for the possibly largest set of trophic groups (Kremer et al., 2017). For the habitat preference trait, benthic organisms included all of those associated with the benthos (infauna and epifauna) as well as demersal species (e.g., flatfish and rays) or those otherwise described living near the bottom (e.g., sandy or muddy surfaces) -all available in FishBase's species environment and biology descriptions. For other, non-specified fishes and sharks, we defaulted to the water column habitat. Phytoplankton, zooplankton, jellyfish, sea birds, sea turtles, and cetaceans were also assigned to the water column habitat. Other important categories (mesopelagic) were not considered for maintaining comparability and wide coverage among different ecosystem models, even if their importance is clear (Agnetta et al., 2019). The mobility trait was organized into four categories: sessile (attached), drifter (passive movers), limited mobility (slow active movers, including burrowers, and crawlers), and mobile (fast active movers and swimmers) (Costello et al., 2015). Sessile (e.g., macrophytes and barnacles) and drifter (e.g., plankton, bacteria, and fish larvae) organisms are biologically well-defined. Limited mobility organisms were mainly macroinvertebrates (e.g., echinoderms, gastropods, and annelids) and juvenile fish, whereas vertebrates capable of swimming (e.g., adult fish, turtles, birds, and marine mammals) were mobile. For non-speciesspecific size data (e.g., microzooplankton), we used Sieburth et al. (1978) plankton size fractions to extract maximum length (cm). Our data range from bacteria (0.0002 cm) to blue whales (3300 cm). Based on Sieburth's size fractions, functional groups were assigned to one of eight size categories (each category increasing by a factor of 10) (see Table 2).
Data coverage was relatively even (>93%) for the three categorical traits ( Table 2). The continuous trait, maximum length, had the lowest data coverage (71%) and was not analyzed separately in this study, however, it was used to assign the nodes to body size classes. Distinction into trait categories was not always clear-cut due to ontogenetic shift in diet and habitat preferences (e.g., bathypelagic species) or food web aggregation problems (mixed groups or broad categories). For these functional groups, we made a case-by-case evaluation based on the detailed metadata (description based on original EwE  Figure 1 superimposed. Although the ordination is nonmetric, the correspondence between the two results is remarkable, except for the positions of K-related indices. publications) or left the trait blank ("NA"). If available from the metadata description, one representative was selected and categorized accordingly. Overall, our data sets are comparable in the sense that they have low resolution at the bottom (e.g., phytoplankton as a single group) and higher resolution at the top (e.g., fish species listed).
First, the relationship between topological indices was investigated using Spearman's rank correlation and multivariate analyses [Principal Component Analysis (PCA), hierarchical clustering, and Nonmetric Multidimensional Scaling (NMDS)]. PCA and hierarchical clustering were used as metric exploratory methods to reveal groups and correlations amongst the 18 indices. The results were compared with those obtained via ordinal methods (Spearman rank correlation and NMDS). PCA works well for linearly correlated data and requires few assumptions (e.g., accepts negative index values such as in s' or KS). Standardized PCA was applied to ensure commensurability of indices. Data for hierarchical clustering were standardized by the standard deviation of variables and then the indices were classified using Euclidean distance and the unweighted pair group method with arithmetic mean (UPGMA or group average method). While other clustering methods do exist, UPGMA was selected based on the highest cophenetic correlation value, which measures how closely the original distances are reproduced by distances in the dendrogram (Sokal and Rohlf, 1962). These methods are able to maintain much of the original metric information in the data, i.e., differences between the scores. Ordinal methods operate by reducing data to ranks thereby disregarding metric properties. From the Spearman's rank correlation coefficients (ρ), a dissimilarity semi-matrix was calculated according to the formula d = 1 − ρ, effectively converting the correlations to the interval [0,2]. Thus, d = 0 means complete similarity corresponding to identical rank orders, and d = 2 reflects complete dissimilarity, i.e., reverse rank orders. The matrix thus obtained was used as input to NMDS. Spearman's correlations were visualized by a matrix plot, while the dissimilarity values were subjected to NMDS to provide an ordination of indices. Analyses were computed and results were displayed using R software [R Core Team, 2020; packages: "stat" and "ggcorrplot " (Kassambara, 2019)], and the SYNTAX-2000 package (Podani, 2001).
Second, for testing the independence of the three categorical traits, Pearson's Chi-squared test and Fisher's exact tests were performed with simulated p-values, using the "stat" package in R (R Core Team, 2020).
WI 3 , K, K dir , and K indir ), and all indices were studentized within their network. The latter means that all index values were subtracted from the sample mean (mean value of the index in its network) and divided by the standard deviation of the sample. The transformations did not change the trends of the relationships between the indices and the traits but helped meet the model assumptions and make the values more independent from the network features. Mixed-effect models were built in R, using "lme4" (Bates et al., 2015) and "lmerTest" (Kuznetsova et al., 2017) packages.

RESULTS
In the dendrogram resulting from the hierarchical classification of indices, six clusters are recognized at the level of 50 (Figure 1, right). Centrality indices (DC, CC, BC, TI 3 , and TO) are grouped into the first cluster. The keystone index (KSI) is a singleton. The indirect component of the K index (K indir ) and K are the closest pair and comprise group three together with K dir , K bu , and WI 3 . These two latter indices are related by both emphasizing bottomup groups. The fourth cluster is somewhat mixed, containing two hierarchical indices (s and s) and a weighted index (WO). Weighted degree centrality (wDC) was found separately in group five. The sixth group is made up of three classical top-down indices (s , TL, and K td ). The discussion of indices and traits will FIGURE 5 | The relevance of each topological index for separating each possible pair of trait values. Significant differences between the topological positions of nodes with different trait categories are marked by colors. For example, the first row shows that DC is different for sessile and drifter organisms but does not separate them.
be based on the groups classified in this dendrogram (Figure 1), since it had a high cophenetic correlation (r = 0.8267) indicating minimum distortion compared to the input Euclidean distances.
The NMDS ordination (Figure 2), even though an ordinal approach, identified largely the same clusters (stress = 0.08). The major difference is that K bu and WI 3 fall away from the other three K components (K, K dir , and K indir ) with which they formed a cluster in metric analysis, showing the inconsistent behavior of these K components. This pattern can also be observed on the matrix plot of Spearman rank correlations (Figure 1, left). In this diagram, rank correlations are contrasted with metric clustering, showing that the cluster membership of K dir is the most ambiguous. The results of PCA can be found in Supplementary Material C. All four methods agree on the correlation of these indices, except for the above-mentioned K components (which are emphasized differently in metric versus ordinal approaches).
Next, we assessed the relationships of three common categorical traits (mobility, habitat, and size) with the 18 indices. We were interested in finding out which trait has predictive power in these aquatic ecosystems and which is negligible. We ran mixed-effects models on the combination of these traits to predict the importance of specific trophic groups in the networks (see Supplementary Material E). An example network for the food web of Bay of Calvi is shown for visualizing the relationships of the mobility trait with the indices WI 3 ( Figure 3A) and K td (Figure 3B). The former emphasizes bottom-up groups (e.g., sessile and drifters) and the latter brings attention to mobile groups at the top of the food web. Figure 4 shows the relationship of the three traits with one representative index per dendrogram group, while Figure 5 summarizes the results of all pair-wise comparisons based on the mixed-effect models.
For mobility, pairwise comparisons are almost always significant, especially for weighted (wDC and WI 3 ) and top-down indices (TL, ss , and K td ) (Figure 5). Weighted indices emphasize drifter organisms, while top-down indices draw attention to mobile organisms. This is nicely visible in Figure 4A and in the violin plots of Supplementary Material D. Centrality indices highlight limited mobility animals. All other groups suggest the importance of drifters. Therefore, depending on what index we utilize, we can predict different groups with the mobility trait. Naturally, a balanced description of a network using one-two indices from each of the six groups is the best. For the mobility trait, groups 3, 4, and 5 are very similar and could be combined in the functional sense (see violin plots in Supplementary Material D).
The habitat trait only had two categories and is less useful in predicting the difference between groups ( Figure 4B). Centrality indices were significantly larger values in the benthic than in the water column habitat (Figure 5). The TL, K, and K bu indices were the opposite (benthic < water column). All other indices had no significant difference between habitat preference of the organisms (Figure 5 and Supplementary Material D). It is somewhat difficult to interpret the biological meaning of these results. With too few, or too many categories, it becomes difficult to interpret the results. Simple traits, such as this one could be useful combined with other studies.
The third categorical trait, size had the opposite problem (with having many, eight categories). This trait behaved in a similar manner as the mobility trait ( Figure 4C). Weighted indices along with the third and fourth index clusters highlight small organisms (0.001-0.1 cm), most likely a reiteration of the drifter mobility category. The keystoneness index (KSI) is not significant in relation to differences in size categories. The centrality cluster seems to favor medium-sized categories (1-10 cm) and top-down indices points out the large-sized groups (>10 cm) ( Figure 4C and Supplementary Material D).
To summarize, mobility was the most reliable trait (>80% pairwise comparisons showing significant differences) and worked best combined with top-down (TL and K td ) or weighted indices (wDC). The size trait showed significant differences between 70% of pairs. Finally, habitat trait was only significant about 50% of the time (although works well for all centrality indices) (Figure 5). Regarding the relationships between the analyzed traits, all trait-combinations were significantly dependent (Chi-squared test and Fisher's exact test, p < 0.001).

CONCLUSION
The major component of sustainability is proper ecosystem functioning and different organisms play their distinct roles in ecosystems. Ecological roles and positions are interdependent, so studying food web position can help to assess functional importance. We addressed the question what kind of organisms (in terms of various traits) occupy what kinds of food web positions (in terms of various centrality indices).
Earlier work on the relationships between food web properties and ecosystem types provided valuable information on the use of indicators at the system level (Heymans et al., 2014). Here, we elaborated this kind of approach at the local level of organisms (trophic groups). The combination of a rich description of network position and the parallel analysis of multiple traits offers a way to improve ecological indication and predictive food web modeling.
For our analysis, it was crucial to set high standards for comparability. The EwE food web database is based on a constant and rigorous modeling approach (similar trophic components across food webs), the way of aggregation is also consistent (stronger at lower levels, e.g., phytoplankton) and mixed-effect models showed that networks (as random effects in the models) had zero or negligible explanatory power due to variance being around zero in most cases (n = 13 indices). The variance due to random effects (networks) was largest for five indices (BC, TO, WO, wDC, and K indir ), but still of minor importance (<0.30).
Our findings agree with the suggestions of Costello et al. (2015) that mobility and size should be included in describing aquatic systems. Some of the results are thus quite intuitive (e.g., more mobile organisms at the top of the food web): these are only confirmed and quantified by the present, large-scale statistical analysis. Other results may be more surprising, like the importance of benthic organisms in the food webs. These species or groups of species are fundamental for transferring matter and energy from the sea bottom to the water column through trophic flows contrasting the natural gravity-related flows and thus contributing to the cycling of energy and matter. Quantification and statistical significance are the ways for robust predictions.
Our study connects theoretical, network-based indicators of ecological role (i.e., topological position) and practical, ecologically meaningful categorizations (i.e., traits). Exploring this bridge is essential for giving the appropriate value to theoretical works also in supporting practical applications (Longo et al., 2015). Notably, the importance of such bridge is testified by the large discussions going on for finding the appropriate measures (Tam et al., 2017) to use in evaluating good environmental status for descriptor D4 (food webs) in the Marine Strategy Framework Directive (EU MSFD, 2008).
Certain pairs of centrality indices are consistently similar in different studies. For example, the weighted indices tend to provide similar node ranks (Jordán et al., 2006;Lai et al., 2015) with only a few exceptions (see Jordán et al., 2007). Closeness centrality is less predictable: it can be quite close (Jordán et al., 2006) or quite far (Lai et al., 2015) from degree. The classification depends also on whether it is based on ranks or distributions (Bauer et al., 2010).
It remains important to investigate what other traits are of potential significance in aquatic ecosystems (e.g., diet) and if the index-trait relationships vary by ecosystems (e.g., estuary versus reef). Research in trait-based aquatic food webs is ongoing (Boukal, 2014) and effort should be made that trait databases are standardized (Kremer et al., 2017) and comparable across environments like freshwater to marine plankton (Weithoff and Beisner, 2019) and scales like megafauna (Tavares et al., 2019). The identification of relevant traits is an ongoing process. Simple, yet descriptive traits (as demonstrated here) can successfully supplement food web research. The choice and the relevance of traits largely depend on the resolution of the food web: for more resoluted networks, a number of traits can be used that make no sense or cannot be obtained for highly aggregated trophic networks. Yet, aggregation and using only the most basic traits make cross-system comparison feasible. Very sophisticated traits cannot be defined for a large number of species, only for a smaller taxonomic neighborhood.
With large databases, both biological information on organisms (e.g., size) and their characterization in a system context (e.g., centrality) can be richly described. Novel algorithms (e.g., machine learning) can further help in the future to provide quantitative analyses and to reveal hidden patterns. This way, trait-based analyses have a chance to offer more than just rediscovering biological knowledge in silico (Endrédi et al., 2018a). Combinations of traits, as a major future task, can be more informative than looking at them separately.
Contributing to the predictive power of food web modeling, by combining biological information and systems analysis, may help to understand and support the management of invasive species. Their trophic and other properties are partly known and but can also be adapted to some extent during invasion. The rules and their limits can be better understood by the present research.
Although the database we used is the largest one in community ecology, described by the highest standards for comparability, it is still loaded by the traditional problems of food web research. Aggregation (defining the nodes) and weighting (defining the links) are always problematic. It will be a interesting question for future research, whether and how omics data can provide larger, more reliable information (Lima-Mendez et al., 2015;Guidi et al., 2016;D'Alelio et al., 2019) and whether this can completely replace or only complement the information we have today.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
FJ, AE, and KP designed and evaluated the results, and wrote the manuscript. SL, AE, and KP managed the database. AE performed network analysis. KP and JP performed the statistical analysis. JP and SL provided comments on the manuscript. All authors contributed to the article and approved the submitted version.