Modelling approaches for capturing plankton diversity (MODIV), their societal applications and data needs

Ecosystem models need to capture biodiversity, because it is a fundamental determinant of food web dynamics and consequently of the cycling of energy and matter in ecosystems. In oceanic food webs, the plankton compartment encompasses by far most of the biomass and diversity. Therefore, capturing plankton diversity is paramount for marine ecosystem modelling. In recent years, many models have been developed, each representing different aspects of plankton diversity, but a systematic comparison remains lacking. Here we present established modelling approaches to study plankton ecology and diversity, discussing the limitations and strengths of each approach. We emphasize their different spatial and temporal resolutions and consider the potential of these approaches as tools to address societal challenges. Finally, we make suggestions as to how better integration of field and experimental data with modelling could advance understanding of both plankton biodiversity specifically and more broadly the response of marine ecosystems to environmental change, including climate change.


Introduction
Plankton diversity, in terms of traits and life history strategies, mediates some of the most important ecological processes, from local to planetary scales, including the biological carbon pump (Basu and Mackey, 2018), element cycling and food web dynamics (Sailley et al., 2015), energy transfer to higher trophic levels (Havens, 1998;Sommer et al., 2018) and system productivity (Hammer and Pitchford, 2005).However, we still do not fully understand how different dimensions of plankton biodiversity impact ecological functions and services at various spatial and temporal scales.These shortcomings limit our ability to project the magnitude or even direction of their change under future scenarios.
Various mathematical and statistical tools, generically termed "models", are used to capture different dimensions of plankton diversity at various scales (e.g., Banas, 2011;D'Alelio et al., 2016;Righetti et al., 2019;Henson et al., 2021) and include particular effects of biodiversity in their projections.The specific underlying assumptions, strengths, and weaknesses of each approach may affect our interpretation of how plankton biodiversity impacts ecosystem functions.Furthermore, the necessary quantity, quality, and type of data required to validate each model differ, and in many cases observations are insufficient or not accessible for model validation (Grigoratou et al., 2022).Importantly, the diversity of modelling approaches also hinders the establishment of a dialogue and the transfer of information between data providers and users.
To address these issues, we provide a concise comparative assessment of common modelling approaches capturing plankton biodiversity to inform future choices of modelling methods and interpretation of results.We present our perspectives on the main strengths and limitations of these approaches, as well as their societal applications and data needs (Everett et al., 2017;Bardon et al., 2021).

Common modelling approaches to capture plankton biodiversity
Scientists from different fields of study have developed a wide variety of modelling approaches employed to study plankton biodiversity patterns, with different purposes, in some cases not even focused specifically on biodiversity.We aim to point out the general characteristics and examples of six commonly used approaches, which we categorize as: Statistical (STM), Ecological Network Analysis (ENA), Individual-Based (IBM), Plankton Functional Type (PFT), Acclimation (ACC) and Adaptive Trait-based (ATM) models.These approaches cover the common dimensions of biodiversity, including variability of genotypes, phenotypes, and the composition of communities and ecosystems (vertical axis on Figure 1).We further categorise these six modelling approaches along a "statistical vs. mechanistic" axis (horizontal axis on Figure 1), to better distinguish those models that require explicit descriptions of ecological or biological processes (mechanistic) from those methods that describe mainly empirical relationships based on field, satellite and laboratory observations (statistical).

Statistical models (STM)interpreting natural diversity
These models describe observed patterns of plankton diversity using a myriad of statistical and machine learning methods.For the most common proxy of taxonomic biodiversity, species richness, various statistical and machine learning methods (Melo-Merino et al., 2020) can be used to develop species distribution models (SDMs) based on occurrence observations e.g. from open databases (GBIF1 , OBIS2 ).Similarly, DNA/RNA sequencing data can be used to characterize diversity patterns of particular groups (e.g.viruses, bacteria, and diazotrophs).The recent availability and lower cost of these data have provided a global scale perspective on the diversity patterns of marine phytoplankton (Righetti et al., 2019), zooplankton (Brun et al., 2016), and of various planktonic groups of organisms (Ibarbalz et al., 2019).However, statistical approaches such as SDMs still face challenges associated with their static representation of dynamic marine ecosystems, where organisms disperse widely (Melo-Merino et al., 2020).Furthermore, the limited and biased distribution of observations limits our ability to untangle the temporal and spatial scale-dependance of species diversity (McGill et al., 2015).These methods are potentially useful to describe large-scale patterns with an ever-increasing set of observations, however, it remains difficult to unravel the mechanisms underlying diversity patterns and their links to ecosystem functions and services.
Ecological network analysis (ENA)tools for ecosystem management Models of static ecological networks provide a discrete representation of ecosystems by depicting energy flows from prey/resources to predators/consumers, assuming that all nodes are at steady state, i.e. equal net energy flows into and out of each trophic group.These networks are often built starting from biomass and energy budgets for each trophic group, based on metabolic parameters such as consumption/biomass and production/biomass ratios.Various studies apply ENA to model planktonic food web functioning.They employ information theory indices to assess ecosystem stability (Scotti et al., 2022), quantify carbon reuse through cycling analysis (Tecchio et al., 2016), and rely on input-output analysis to estimate transfer efficiency along a chain of discrete trophic levels (Saint-Beát et al., 2020).For example, D' Alelio et al. (2016) studied the structure of energy circulation and found little difference in trophic efficiency between phytoplankton bloom and non-bloom periods.ENA indices can also detect the impacts of anthropogenic stressors on planktonic food webs, and were used to show that eutrophication disrupts the prevalence of pathways for energy transfer to fish (Meddeb et al., 2018).Limitations of this approach include the strong assumption that each node remains at a steady state, and the amount of data required to validate models including many different trophic levels.

Individual-based models (IBM)close to real life interactions and evolution
Individual-based models (IBMs, also called agent-based models) are iterative algorithms that apply a set of rules to, e.g., individuals of a population, thereby simulating life cycles, from birth to death (DeAngelis and Mooij, 2005;Grimm et al., 2005;Hellweger and Bucci, 2009).IBMs allow ecosystem properties to emerge from traits and interactions of individuals, including randomness through the process of replication, where offspring inherit traits from their parent(s), with mutation between each generation (Meliań et al., 2011).Furthermore, events like random death or encounter may be modeled, which allows studying the importance of such discrete events in population dynamics (Picq et al., 2019).IBMs are used to investigate the effects of molecular and physiological processes on global plankton biogeography (Hellweger et al., 2014) and global N:P ratios (Toseland et al., 2013).In combination with individual-level observations, these models hold great potential to advance understanding of how individual-level processes impact multiple levels of ecological organization across spatial scales (Kreft et al., 2013;Hellweger et al., 2016).The general lack of any analytical treatment limits the derivation of insights for correlative analysis, but see Black and McKane (2012).The most important limitation for IBMs is likely their massive computational requirements for implementing simulations and analyzing the output.

Plankton functional type models (PFT)modularity and function
Plankton functional type (PFT) models group organisms based on the similarity of their ecological or biogeochemical Six modelling approaches commonly used to capture various dimensions of plankton biodiversity.We further organize the modelling approaches along an axis spanning from those models that require explicit descriptions of ecological or biological processes (mechanistic) to those methods that do not (statistical).
functionality.PFT models originated from the first, simple plankton community m odels re so l vin g n u t ri en ts , phytoplankton, zooplankton and detritus (NPZD), and later the microbial loop (e.g.Fasham et al., 1990).As traditionally PFT models have been used to study the biogeochemical function of plankton, they have been criticized for representing poorly the eco-physiological differences between functional types (Anderson, 2005), and have recently been extended to include such detail by describing key traits and trade-offs (Follows et al., 2007;Follows and Dutkiewicz, 2011).PFT models are applied for time scales from days to millenia, and commonly used in regional, basin-scale and global applications for understanding and assessing the ecosystem response to environmental conditions, such as nutrient loading rates (Lancelot et al., 2007), acidification (Artioli et al., 2012), intensity of fishing efforts (Petrik et al., 2019), and climate change (Kwiatkowski et al., 2019).However, diversity is typically limited to the number of functional groups modelled, and is typically resolved in detail only for selected trophic levels, at substantial computational cost.This makes it challenging to analyse and disentangle diversity effects from other spatio-temporal dynamics.Furthermore, PFT models that describe trait-spaces typically assume fixed (non-adaptable) trait values along multiple trait dimensions, while resolving many fewer ecotypes than are observed in nature.This limits their ability to capture the adaptive capacity of ecosystems in response to environmental changes (Ward et al., 2019).

Acclimation models (ACC)from physiology to community dynamics
Acclimation is the ability of organisms to adjust their physiology and behavior (phenotypes or traits that are not inherited) to enhance their fitness in a changing environment (Laws and Bannister, 1980;Smith et al., 2011).Although many models ignore this important ability, plankton acclimation models exist (Pahlow et al., 2008;Bonachela et al., 2013;Pahlow and Oschlies, 2013;Wirtz and Kerimoglu, 2016).It is challenging to disentangle the relative contributions of acclimation and evolutionary adaptation to overall ecosystem response, because in the short term they may have either similar or very different tendencies (Moreno and Martiny, 2018), which in the long term are inter-dependent (Edelaar and Bolnick, 2019).Representing physiological flexibilities as an acclimative response is an effective and efficient way of modelling certain effects of diversity, which avoids the heavy computational burden of representing biodiversity explicitly, e.g., by using multiple PFTs (see above).This approach has been used to study how physiological flexibilities impact the global and regional biogeography of elemental ratios and primary production (Pahlow et al., 2020), energy transfer efficiency to higher trophic levels (Chakraborty et al., 2020), response to eutrophication (Kerimoglu et al., 2018) and climate change (Kwiatkowski et al., 2018), and formation of deep chlorophyll maxima (Masuda et al., 2021).Acclimation models are often not standalone, and one of their strengths is that they are readily incorporated into a variety of models, such as PFTs, IBMs, and ATMs (see below), to represent plankton diversity (Ross et al., 2011;Smith et al., 2016;Chen et al., 2019;Cadier et al., 2020).Arguably the greatest limitation of ACCs is that optimality solutions can become quite intricate and, because of interdependencies between various cellular functions, any change in a model formulation may require the re-derivation of existing solutions.

Adaptive trait-based models (ATM)describing trait dynamics of communities across ecological scales
Adaptive trait-based approaches focus on the dynamics of functional traits as key outputs, rather than inputs to models (Klausmeier et al., 2020), and they often overlap with more widely used modelling approaches such as PFTs and ACCs.ATMs commonly represent traits in two contrasting ways: as either a full distribution or an aggregate approximation via the moment-closure method (Wirtz and Eckhardt, 1996;Norberg et al., 2001;Merico et al., 2009).In the full distribution approach traits are "free" to evolve in response to the current selection pressure (Bruggeman and Kooijman, 2007;Banas, 2011;Gaedke and Klauschies, 2017).This approach can be computationally demanding for large-scale applications, which can hinder mechanistic understanding of biomass-trait feedbacks.In contrast, the aggregate approach, typically applied to communities, must assume a specific shape for the trait distribution, which makes it computationally efficient and allows direct insights into the mechanisms underlying changes in aggregate properties, namely total biomass, mean trait, and trait variance (Chen and Smith, 2018;Klauschies et al., 2018).The concise nature of the aggregate approach makes it useful for both applied (e.g.Terseleer et al., 2014;Acevedo-Trejos et al., 2015) and fundamental (e.g.Coutinho et al., 2016;Guill et al., 2021) research questions related to plankton diversity, but this comes at the cost of limited ability to accurately resolve the fitness landscape and inability to capture certain observed diversity distributions.

Discussion
The modelling approaches described above have both shared and contrasting characteristics, which allow them to address specific societal applications (see section 3.2 below).However, they all share a common need for improved data inputs (see section 3.3 below).We illustrate these differences and similarities with examples from the literature (Figure 2) and elaborate in the sections below.

Strengths and limitations
Strengths of these modelling approaches are that they (1) facilitate direct links and comparisons to observed diversity patterns (STMs, IBMs, ATMs), ( 2) have applications to societal benefits (STMs, ENAs, PFTs), (3) can capture multilevel ecological complexity (IBMs and ACCs), and (4) have computational efficiency and analytical accessibility (ACCs and ATMs).Some of their respective disadvantages are: (1) heavy computational requirements that limit applications (mainly IBMs, but also PFTs), (2) static representation of plankton dynamics (STMs, ENAs), and (3) lack of ecological or biological complexity (ENAs, PFTs, ATMs, ACCs).These strengths and limitations demonstrate that each of the above approaches are particularly suited to specific applications, with no single approach capturing all aspects of plankton diversity.Therefore, different approaches may be needed to capture biodiversity within different trophic levels or functional groups.

Societal applications
Some modelling approaches, such as ENA and PFTs, can inform managers and policy-makers about the status of ecosystem integrity and its deterioration, for example, due to anthropogenic stressors (de la Vega et al., 2018;Fath et al., 2019;Nogues et al., 2021).Trophic networks studied with ENA often include all ecosystem components, making them suitable to assess the balance between trophic guilds as requested by EU legislation (i.e.see MSFD descriptor D4C23 ).Moreover, ENA indices are among the food web indicators proposed in the framework of OSPAR4 to capture the whole-system properties of marine food webs (Niquil et al., 2014).Nevertheless, defining the reference levels or thresholds to quantify deviations from the Good Environmental Status (GES) remains challenging, as ecosystems attain different stable configurations over time Key characteristics of six modelling approaches commonly used to capture various dimensions of plankton diversity.The inset of each model description depicts an idealized model output and where the term "diversity" represents a specific measure of diversity for each dimension, i.e.Genetical, Organismal, Functional, Taxonomical, or Ecological.A better overview of model outputs is available through the examples mentioned for each method.In addition, we summarized main data requirements, outputs and common societal applications for each approach.Notice that some entries can both be data needs and model outputs.The latter is because variables in a modelling approach can be used to parametrize and/or validate a model but can also be a variable computed or predicted by the model.(Tomczak et al., 2021).PFTs are used as the biogeochemical cores of Earth System Models (ESMs), thus they play a key role in future climate simulations.PFTs are also used to model Harmful Algal Blooms (HABs) and as part of larger models for predicting the effects of climate and fishing yield on key populations, which in turn inform stock assessments and policy decisions.Both ESM and fisheries models typically resolve only simplistic abstractions of plankton diversity (Kwiatkowski et al., 2019;Petrik et al., 2020), which is insufficient to capture functional diversity.This has raised concerns about the ability of ESMs and fisheries models to project future responses, especially in terms of community resilience and energy fluxes to higher trophic levels (Heneghan et al., 2016;Tittensor et al., 2021).

Unmet needs for data and collaboration
There is a growing need and recurrent call to contrast model estimates and predictions with empirical observations (Kremer et al., 2017).However, relevant traits across trophic levels remain scarcely measured, rarely consistently reported (different units, lack of adequate metadata including details on data quality and methods) and rarely shared in public data repositories (if so, typically with poor data latency).With only 7% of the ocean so far actively covered by long-term biological observations (Satterthwaite et al., 2021), current access to even the most basic biogeographic data on plankton biomass and abundance remains very limited (and virtually non-existent for microbes/ protists) and often hampered by its spread over various platforms and institutions.This is despite the recognition of phytoplankton, zooplankton and microbial biomass and abundance as Essential Ocean Variables (EOV), Essential Biodiversity Variables (EBV) (Miloslavich et al., 2018) and even Essential Climate Variables (GCOS, 2021).Development and validation of all presented modelling approaches would benefit in particular from data on stoichiometric composition (C, N, P, Chl-a), traits (e.g.cell/body size and shape) and biomass measured across a full spectrum of environmental conditions (temperature, light, nutrients), with photosynthetic rates, nutrient uptake and respiration rates also critically needed.In addition, comparisons of modelling approaches with molecular data remain scarce, despite the recent availability of a considerable amount of data, which hold great potential as shown with individual-based comparisons (Hellweger, 2020).
Obtaining quantitative individual-level data is crucial, yet remains a particular challenge because widely available genetic data cannot be converted accurately into biomass or cell abundance (Kelly et al., 2019;Piwosz et al., 2020;Milivojevicé t al., 2021).We call for free, open access to plankton EOV and EBV data, and their enhanced and sustained observations, which would enable further integration of field observations, remote sensing products, and experimental studies with modelling approaches.Therefore, we advocate to follow the FAIR principles, i.e.Findability, Accessibility, Interoperability and Reusability (Wilkinson et al., 2016) for the sustainable management of plankton data.Adherence to these principles would ensure exploitation of the potential that the various approaches hold and foster building a new generation of models and decisionsupport tools for effective management of ecosystem services linked to plankton diversity.
More meaningful collaborations are also needed between the scientists who plan and conduct laboratory experiments, oceanic observations, and modelling studies.Proposals are rarely planned with a holistic view for combining experiments, observations and modelling.Hence, the scientists who develop and apply models are often not aware of the quantity and quality of available data relevant to their applications, while the scientists who plan and conduct laboratory experiments and oceanic observations are often not aware of the limitations of relevant modelling approaches and their results (e.g., Everett et al., 2017).The comparative assessment provided here aims to raise awareness and stimulate discussions in the planning phase of collaborative studies seeking to combine observations and modelling, which is the best and perhaps only hope for understanding any complex system (Bar-Yam, 2016).

Challenges and future directions
Approaches differ in their diversity-sustaining mechanisms, which relate to the positive effect of species diversity on productivity (Loreau, 2004).However, a higher diversity of coexisting species does not necessarily imply higher productivity.For example, although species can coexist via the competition-colonisation trade-off (i.e., species that are better at exploring unutilized spaces have poorer competitive capabilities), this coexistence does not lead to transgressive overyielding (Loreau, 2004).This may be particularly relevant in spatially explicit ocean models, where mixing can sustain plankton diversity (Chen et al., 2019;Dutkiewicz et al., 2020;Masuda et al., 2021).Another potentially relevant determinant of individual-level fitness (Edelaar and Bolnick, 2019), and therefore coexistence and diversity maintenance, is acclimative flexibility, which is being actively studied within the framework of Modern Coexistence Theory (Chesson, 2000;Barabaś et al., 2018).One promising target for further research would be to examine how acclimation affects equalizing and stabilizing effects (Adler et al., 2007;Lankau, 2011).Since rapid evolution may promote species coexistence in diverse predator and prey communities by altering interspecific niche and fitness differences (Klauschies et al., 2016), we may also expect a positive impact of acclimation on species coexistence.
For understanding the response of plankton biodiversity in particular, and aquatic ecosystems more broadly, to global and other environmental changes, it is essential to develop better representations of the adaptive capacity of life in large scale simulations, e.g., in ESMs used to model climate change.Hence, there is a pressing need to validate ACC and ATM approaches in such spatially explicit applications.These approaches could potentially capture important feedback responses between ecology, evolution, and environmental conditions in ESMs (Bonan and Doney, 2018;Ward et al., 2019).