Integrating Multiple Data Types to Connect Ecological Theory and Data Among Levels

Ecological theories often encompass multiple levels of biological organisation, such as genes, individuals, populations, and communities. Despite substantial progress towards ecological theory spanning multiple levels, ecological data rarely are connected in this way. This is unfortunate because different types of ecological data often emerge from the same underlying processes and, therefore, are naturally connected among levels. Here, we present an approach to integrate data collected at multiple levels (e.g., individuals, populations) in a single statistical analysis. The resulting integrated models make full use of existing data and align statistical ecology with ecological theories that span multiple levels of organisation. Integrated models are increasingly feasible due to recent advances in computational statistics, which allow fast calculations of multiple likelihoods that depend on complex mechanistic models. We discuss recently developed integrated models and demonstrate their implementation and application using data on freshwater fishes in south-eastern Australia. Available data on freshwater fishes include population survey data, mark-recapture data, and individual growth trajectories. We use these data to estimate demographic vital rates (size-specific survival and reproduction) more accurately than previously possible. We show that integrating multiple data types enables parameter estimates that would otherwise be infeasible and argue that integrated models will strengthen the development of ecological theory in the face of limited data. Although integrated models remain conceptually and computationally challenging, integrating ecological data among levels is likely to be an important step towards unifying ecology among levels.


Statistical methods
Our aim was to estimate the parameters of a density-dependent Leslie matrix model: nt,r = (1 / (1 + kr Nt-1,r)) Ar nt-1,r, from data on size-abundance distributions, size-at-age, and binary recapture histories. Here, nt,r is a vector containing the abundances of each age class at time t in river r and Ar is a matrix of vital rates (survival probabilities and fecundity estimates) in river r. Nt,r is the total number of individuals in the population at time t in river r, and the parameter kr governs the strength of density dependence in river r, with values close to zero indicating no density dependence and positive values indicating negative density dependence. We used five age classes and eight size classes based on the following bins: (0 g, 200 g], (200 g, 500 g], (500 g, 1000 g], (1000 g, 2000 g], (2000 g, 5000 g], (5000 g, 10000 g], (10000 g, 20000 g], and (20000 g, 60000 g]. These bins were chosen arbitrarily, with unequal bin widths to avoid the majority of individuals falling into one or a few size classes.
We used time series of size-abundance distributions to estimate vital rates, using sizeat-age data to convert observed size-class abundances to ages, and using binary recapture histories to estimate the probability of detecting an individual in any given survey. We connected the population matrix model to the three data types with three component likelihoods: Size-abundance distributions (ℒabundance): yt,r ~ Poisson(p Ω nt,r); Capture histories (ℒcapture): zi ~ CJS(p, s); and Size-at-age (ℒsize-age): ui ~ Multinomial(vi, ωi).
The first component likelihood assumes that abundances in all size classes at time t in river r (yt,r) are independently Poisson-distributed, conditional on unobserved initial abundances (n0,r) and the matrix population model outlined above (i.e., age classes are connected through the Leslie matrix, Ar). Observed abundances are reduced relative to true abundances due to imperfect detection, with a size-independent detection probability p. Observed size-class abundances are converted to age-class abundances by the matrix Ω, which captures the probability that an individual in size class i belongs to age class j, for all i and j. This model structure requires priors on the Leslie matrices (Ar), the parameters p and s of the Cormack-Jolly-Seber model, the matrix Ω, age-class abundances at time 0 (n0,r), and the Beverton-Holt density-dependence parameter (kr). We used a mixture of vague and vaguely informative priors, drawing on past empirical studies to inform estimates of survival and size-age associations. We did not assess sensitivity to choice of priors because our aim was to illustrate the implementation of a simple integrated model rather than present a rigorous analysis of our data.
Leslie matrices are sparse, with non-zero survival probabilities on the lower diagonal and on the diagonal in the final age class, which includes all individuals five years or older.
Here, parameters in bold are vectors with five elements (one for each age class), and the subscript r denotes survival values in river r. The tilde is shorthand for "is distributed as", and all Normal distributions are parameterised with means and standard deviations. A HalfNormal distribution is a Normal distribution truncated to non-negative values.
All definitions follow those used for survival probabilities.
Here, parameters in bold are vectors with one element for each age class. All other definitions follow those used for survival probabilities.
We used Dirichlet priors for the multinomial probabilities ωi in each size class i.
These priors were designed to be informative given relatively few observations of fish with known age in larger size classes. Specifically, we set the Dirichlet concentration parameters for ωi to 80000 × exp(-αi 2 / 2), where αi,j ={(10 / 7) × (1 -i) + 2j}. This prior favours agesize associations that are relatively concentrated along the main diagonal of the age-size matrix Ω, so that larger fish are likely to be older than smaller fish.
Last, we assigned the survival probabilities (s) independent beta priors with both parameters equal to one and assigned the detection probability a beta prior with both parameters equal to 10. We set independent uniform priors on the density-dependence parameters in each river (kr), with lower bounds of 10 -5 and upper bounds of 0.2.
We assumed the three component likelihoods were independent, so that the composite likelihood was the product of all three component likelihoods: Given this composite likelihood, we used to greta R package to generate fully Bayesian   Table S2. Posterior mean, median, and 80 % credible intervals for the density dependence parameter k in each river system. Density dependence was modelled with a Beverton-Holt function with constant parameter k for all elements of the transition matrix. Values of k near zero indicate no apparent density dependence and values of k greater than zero indicate negative density dependence. The prior distribution of k did not allow negative values, so that positive density dependence was not included in fitted models. The effects of estimated k values on vital rates at different abundances are shown in Figure S2 (below).  Figure S2. Fitted density dependence scaling factors for Murray cod in six river systems in south-eastern Australia. Scaling factors are the proportional reduction in all vital rates for a given abundance.