From Data to Marine Ecosystem Assessments of the Southern Ocean: Achievements, Challenges, and Lessons for the Future

Southern Ocean ecosystems offer numerous benefits to human society and the global environment, and maintaining them requires well-informed and effective ecosystem-based management. Up to date and accurate information is needed on the status of species, communities, habitats and ecosystems and the impacts of fisheries, tourism and climate change. This information can be used to generate indicators and undertake assessments to advise decision-makers. Currently, most marine assessments are derivative: reliant on the review of published peer-reviewed literature. More timely and accurate information for decision making requires an integrated Marine Biological Observing and Informatics System that combines and distributes data. For such a system to work, data needs to be shared according to the FAIR principles (Findable, Accessible, Interoperable, and Reusable), use transparent and reproducible science, adhere to the principle of action ecology and complement global initiatives. Here we aim to provide an overview of the components of such a system currently in place for the Southern Ocean, the existing gaps and a framework for a way forward.


INTRODUCTION
The Southern Ocean covers ∼10% of the global ocean and plays a pivotal role in biogeographic processes. It is an essential contributor to oceanic primary production and biodiversity, exports nutrients and oxygen to the world's ocean basins, and plays a central role in global ocean circulation (Xavier et al., 2016). The world's oceans have seen tremendous change in the last 30 years, including changes to species, communities, habitats, and ecosystems (collectively referred to as ecosystem changes). These underpin critical Southern Ocean ecosystem services, including (but not limited to) climate regulation, fisheries, tourism, and aesthetic value (Grant et al., 2013;Cavanagh et al., 2021).
The Antarctic Treaty System includes the Protocol on Environmental Protection to the Antarctic Treaty (also known as the Madrid Protocol) and the Convention on the Conservation of Antarctic Marine Living Resources (CCAMLR). The Treaty recognises "the intrinsic value of Antarctica, including its wilderness and aesthetic values and its value as an area for the conduct of scientific research, in particular, research essential to understanding the global environment" (Protocol on Environmental Protection to the Antarctic Treaty 1991, Article 3.1), while CCAMLR aims to ensure the "rational use" of marine living resources subject to "principles of conservation" (CCAMLR, 1980, Article II). As such, there is an urgent need to balance the impact of cumulative stressors and the continued use of marine ecosystem services provided by the Southern Ocean (Grant et al., 2013). Furthermore, there is a strong spirit of international cooperation ingrained within the Framework of the Antarctic Treaty, including the stipulation that "Scientific observations and results from Antarctica shall be exchanged and made freely available" (Antarctic Treaty 1959, Article 3).
The second UN Decade of Ocean Science for Sustainability (2021)(2022)(2023)(2024)(2025)(2026)(2027)(2028)(2029)(2030) has begin in 2021 and focuses on developing "the science we need for the ocean we want" (IOC, 2020). The Decade is closely linked to the United Nations 17 Sustainable development goals (SDG) established in 2015 as a universal call to action to end poverty, protect the planet and ensure that all people enjoy peace and prosperity by 2030. SDG such as SDG 2 Zero Hunger and SDG 8 Decent Work and Economic Growth will have to be balanced with SDG 13 Climate Action and SDG 14 Life Below Water.
Biodiversity is a fundamental aspect of marine ecosystem health, and it upholds many ecosystem functions and services. Changes in species distribution and abundance form the foundation of ecosystem changes at the level of habitats, species, communities, and food webs (Brasier et al., 2019). In turn, these complex ecosystem changes will also affect the marine ecosystem services reliant upon them.
Managers and policymakers need access to the right tools to make informed decisions. An integrated end-to-end system where each component builds upon, and is informed by the other parts, represents the optimum tool for this job. Translating these observations and knowledge into policy-ready advice requires a system that allows the selection of indicators, conducting of assessments, making of predictions or projections, and making the appropriate decisions (Benson et al., 2018;Muller-Karger et al., 2018b;Canonico et al., 2019).
In an effort to optimise biodiversity monitoring initiatives, two synergistic global efforts identified specific priority variables for monitoring life in the sea and on land: Essential Ocean Variables (EOVs) through the Global Ocean Observing System (GOOS), and Essential Biodiversity Variables (EBVs) from the Group on Earth Observations Biodiversity Observation Network. These efforts strive to form the basis of efficient and coordinated monitoring programs worldwide (Muller-Karger et al., 2018b) while allowing for regional-scale monitoring frameworks and "essential variables" to be developed concurrently. Within that broader context, the Southern Ocean Observing System (SOOS) led the creation of Southern Ocean specific ecosystem Essential Ocean Variables (eEOVs; Constable et al., 2016). These variables are defined as the derived measurements required to study, report, and manage biodiversity change, designed to play the role of brokers between research and monitoring initiatives and decision-makers.
These different types of variables serve as a guide for the data necessary for monitoring changes in biodiversity. While they all share a common rationale, there are differences between these frameworks. EOVs look at biomass and diversity, and cover and composition. EBVs focus on genetic composition, species populations, species traits, community composition, ecosystem function, and ecosystem structure. eEOV's focus on general ecosystem properties such as spatial arrangements of taxa, foodweb structure and function, and anthropogenic pressures.

Toward an Integrated Marine Biological Observing and Informatics System
Developing the information needed for marine ecosystem assessments to understand the impact of changes on the Southern Ocean marine ecosystem [such as the Marine Ecosystem Assessment for the Southern Ocean (MEASO)] requires an integrated system of marine biological observations and informatics as defined by Benson et al. (2018) that are compatible with global systems but also address the specific properties of the Southern Ocean. They propose a cyclical architecture (Figure 1) where each component is connected to and helps to inform the other components. The system should build upon existing platforms and standards, adding to them where needed. Furthermore, such a system cannot be static but must enable the inclusion of new methods and ideas. It also needs to be transparent and traceable, which can be achieved by ensuring that data, algorithms and tools are shared according to the FAIR principles (Findable, Accessible, Interoperable, and Reusable; Wilkinson et al., 2016). Central to such a system is a minimal set of variables using comparable methods across time and space (EOVs, EBVs, and eEOVs) and relies on monitoring and use of shared protocols. Data needs to adhere to international standards (such as the Darwin Core standard) and fit the FAIR principles. It also requires analytical algorithms, tools, and workflows that are shared and include documented provenance.
The principles of open and free data are ingrained in the Antarctic Treaty. At a global level, the benefits of open and free data are ever more recognised, and data policies encouraging data sharing have been widely adopted. The FAIR principles provide guidelines for data management that give data greater value and enhance their propensity for reuse and sharing, at scale by machines. As such, they provide a framework for integrative scientific discovery and policy utilisation. Furthermore, the FAIR principles can be applied to non-data assets such as analytical workflows.
Following the principles of action ecology (White et al., 2015), policymakers and managers tasked with overseeing the Southern Ocean require timely responses to their needs and questions. Those responses need to: (1) address the most pressing ecological problems, (2) include transdisciplinary input from a range of researchers and stakeholders, (3) be conducted in open FIGURE 1 | Conceptual representation of the different components of an integrated system for the Southern Ocean centred around Essential Variables (EOVs, EBVs, and eEOVs) and linked to the FAIR Principles (red) and IPY data vision (purple).
access ways, and (4) use technology and globally integrated data resources, such as the Ocean Biodiversity Information System (OBIS) and the Global Biodiversity Information Facility (GBIF). By incorporating the key characteristics of action ecology, the scientific community can ensure their efforts are efficiently and effectively incorporated into policy and management. Focusing on rapid analysis, using publicly accessible resources, and open access methods leads to shorter intervals between data and knowledge as well as opportunities for incorporating slight adjustments and re-running analyses.
At the start of the UN Decade, we have a unique opportunity to develop a framework that can support decision making for the Southern Ocean. In this paper, we examine the resources that currently exist, the gaps in the resources, and provide a perspective on how we can work together to develop data management and e-science facilities that can link biodiversity observations and biodiversity change modelling to inform decision making (Beja et al., in press). This paper complements the "audit" of the materials and methods available for the first MEASO provided by Brasier et al. (2019). It does not aim to identify specific data gaps, it rather seeks to provide a set of recommendations on the shape and principles of an integrated marine biological observing and informatics system that should underpin future MEASOs with a strong focus on biological data.

HISTORICAL PERSPECTIVE
The Southern Ocean has a long history of exploration and scientific activities (Figure 2). The expeditions of Cook and Ross to the Southern Ocean in the late 18th and early to mid-19th century marked the onset of intensive exploration of this region. The observation by Cook of large numbers of seals and whales in high latitudes led to a rush of sealers toward the Southern Ocean. From 1784 onward, they would hunt in the region of South Georgia, the Falkland Islands, Cape Horn, the South Sandwich Islands, and the coast of South America. By 1825, some populations of fur seals were close to extinction, and sealers began hunting elephant seals and some species of penguins for their oil. 1904 saw the start of extensive exploitation of all seven species of whales found in the Southern Ocean (Walton, 2013).
While the scientific focus of the first (1882, 1883) and second (1932,1933) International Polar Years was firmly on the Arctic region, there was an increase in exploration of the Antarctic and the Southern Ocean, starting with the Belgica expedition (1897-1899) and ending with the Discovery Investigations . These expeditions of the early age of Antarctic exploration form the basis of the current records of Southern Ocean biodiversity (Griffiths, 2010;Brasier et al., 2019).
The global increase in collected data, especially during the second Polar Year, and subsequent loss of collected data during the Second World War made the need for a World Data Centre clear (Beja et al., in press). During the International Geophysical Year (IGY, 1957(IGY, -1958, nations collaborated on earth science topics across the globe. One of the requirements set out by the IGY was that "all observational data shall be available to scientists and scientific institutions in all countries" (Odishaw, 1959). This led the International Council of Scientific Unions to develop the World Data Centres, which were reformed in 2008 to the World Data System that we know today. Unlike the previous polar years, this IGY included 18 months of fieldwork in the Antarctic. As a result, the IGY is also referred to as the 3rd International Polar Year (Bailey, 2013). Besides the development of the World Data Centres, this also led to the establishment of the Scientific Committee of Antarctic Research (SCAR) by the International Council of Scientific Unions. The 12 countries involved in the Antarctic component went on to sign the Antarctic Treaty in 1959. This treaty set apart the Antarctic continent and surrounding oceans (the area south of 60 • S) for peaceful scientific collaboration (Walton, 2013). Within the Antarctic Treaty System, two instruments were established to promote the preservation and conservation of living resources in the Southern Ocean. The first was the "Measures for the Conservation of Antarctic Fauna and Flora, " agreed in 1964, which entered into force in 1982. This was followed by the "Convention for the Conservation of Antarctic Seals" (agreed in 1972; entered into force 1978) with the objectives of protection, scientific study, and rational use of Antarctic seals, and to maintain a satisfactory balance within the ecological system of the Antarctic treaty area.
Meanwhile, the sub-Antarctic was characterised by intensive harvesting of finfish from the late 1960s to the mid-1970s, along with the emergence of interest in the large-scale exploitation of Antarctic krill, Euphausia superba. As the most notable research activities into krill stemmed from the Discovery Investigations in the late 1920s and 1930s, there was no adequate information concerning the biology and stocks of these resources. This led to the Biological Investigations Of Marine Antarctic Systems and Stocks (BIOMASS) project initiated by SCAR in 1972, which resulted in various internationally coordinated expeditions from 1980 to 1985. Discussion within the Antarctic Treaty on the Conservation of Antarctic Marine Living resources commenced in 1975 and led to the establishment of the Commission for the Conservation of Antarctic Marine Living Resources in 1982(El-Sayed, 1994.
Technological advances from the 1980s to the early 2000s saw an important shift in terms of data management. The increasing use of the internet and email in the 1990s created a shift from paper to digital resources and set the scene for another international effort. By the start of the 21st century, there was a growing need to coordinate large-scale access to biodiversity data. This led to various initiatives both on land and in the marine realm.
Sequencing of proteins and DNA started in earnest in the 1970s but was a time-consuming and costly process. Nevertheless, comparisons between different research groups lead to various unexpected discoveries. By the late 1970s, there was a growing consensus for the development of an international database of nucleic acid sequence data. In 1982, Genbank was started with funding from the National Institute for Health. Similar initiatives were initiated in Europe (European Molecular Biology Laboratory) and Japan (DNA Data Bank of Japan). Currently, these three groups collaborate under what is known as the International Nucleotide Sequence Database Collaboration.
Building upon a request from the Antarctic Treaty Consultative Meeting (1985) Under the International Union for Biological Sciences the Taxonomic Database Working Group (TDWG) was set up in 1985 focussing on data standards for plant taxonomic databases. Over the years, this working group has expanded its scope to include general taxonomic data (1995) and ultimately developing standards for publishing and integrating biodiversity information. This included a loosely defined set of terms that can be considered the first iteration of the Darwin Core Standard. This set of terms was further developed, and in 2009, the first ratified Darwin Core standard was published (Wieczorek et al., 2012). While the acronym is still used, the standards have had various name changes in order to better represent the ongoing activities, and as of 2006, they are known as Biodiversity Information Standards.
The Census of Marine Life was a decadal global effort (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) to assess and explain the diversity, distribution, and abundance of life in the oceans (Costello et al., 2010). The Census was divided into various regional programmes, such as the Census of Antarctic Marine Life (CAML). The Census paved the way for various initiatives related to marine biodiversity data; two of the most important developments were the creation of the OBIS and the World Register of Marine Species. 1 In 2000, OBIS was launched as a framework where scientists and others can discover both historic and new data on species distributions and abundances in the world's oceans (Grassle, 2000). Twenty years later, OBIS remains one of the principal legacies of the Census. The World Register of Marine Species, created in 2007, aims to "provide freely online the most authoritative list of names of all marine species ever published." Several separate portals within it with a taxonomic or regional focus, have expanded the scope of its holdings from extant marine species and now also include non-marine and/or fossil representatives.
The Global Biodiversity Information System (GBIF) was created in 2001, based on recommendations to the Organization for Economic Cooperation and Development (OECD, 1999). They recognised that worldwide access to biodiversity data and information could provide many economic and social benefits and enable sustainable development by providing sound scientific evidence. Equally, it recognised that an international mechanism was needed for this purpose. GBIF aggregates freely and openly available species occurrence data from around the world and makes it available for use in research and policymaking (Robertson et al., 2019).
The (CAML; 2005-2010) was a 5-year international project that focused attention on the ice-bound oceans of Antarctica, and which coordinated 18 research voyages in Antarctica during the IPY and/or within the CAML life-span (Schiaparelli et al., 2013). It was part of both the IPY and the CoML (Gutt et al., 2010). The main objective of CAML was to understand the biodiversity of the Southern Ocean and set reference baselines to allow subsequent measurements of change.
The 4th International Polar Year (2007Year ( -2008 demonstrated that the volume of data that was collected was less of a challenge than its heterogeneity and the cultural diversity of the collectors, making the challenge sociological as much as it was technical. Thus, a vision was developed that (polar) data should be Discoverable, Open, Linked, Useful, and Safe (Parsons et al., 2011).
The SCAR Marine Biodiversity Network (SCAR-MarBIN) was an information network that was part of the IPY data ecosystem that had these principles deeply ingrained in its development. SCAR-MarBIN was a sister project to the CAML . It was initiated to establish a web-based inventory of the Antarctic marine biodiversity focusing on three main data types: taxonomy, biogeographic data, and metadata . It was quickly adopted by SCAR and is the regional node of both the OBIS and GBIF. Building upon a first block contributed by Clarke and Johnston (2003), the first Register of Antarctic Marine Species was compiled and published (De Broyer and Danis, 2011) in 2005 as a thematic node of the World Register of Marine Species mentioned above. As the scope expanded beyond the marine environment, SCAR-MarBIN developed into the SCAR Antarctic Biodiversity Portal, 2 a Register of Antarctic Species as part of the World Register of Marine Species, and the Lifewatch Taxonomic Backbone.
The Marine Biodiversity Observation Network (MBON) is a "coalition of the willing" who agree to share knowledge and know-how to evaluate changes of biodiversity in the ocean, including data, products, protocols and methods, data systems and software. MBON was established as a theme of the GEO BON (Group on Earth Observation Biodiversity Observation Network) in 2014. In 2021, SCAR Antarctic Biodiversity Portal was recognised as a regional MBON node.
The rise of molecular techniques to study living organisms (especially microorganisms such as phytoplankton and bacteria) led to an exponential increase in nucleotide sequence data, including genes, genomes, and metagenomes. To develop standards for the description of these data, the Genomics Standards Consortium was established in 2005, in parallel with the workings of the TDWG. The development and community ratification of the Minimum Information on any (x) Sequence (MIxS) data standard by the Genetics Standards Consortium (Yilmaz et al., 2011) links important metadata like DNA extraction and sequencing protocols, as well as environmental measurements to the nucleotide sequences, which is fundamental to its correct use and interpretation.
The SOOS is an international collaboration under the auspices of SCAR and the Scientific Committee on Oceanic Research (SCOR) to collect and deliver sustained and coordinated observational data on dynamics and change of Southern Ocean systems to researchers and other stakeholders, including governments and industries. Launched in 2011, its tasks include the design, advocacy, and implementation of observing and data sharing systems (Meredith et al., 2013). SOOS has defined 2 www.biodiversity.aq its vision and begun improving the coordination of observing efforts through Regional Working Groups, which coordinate the collection of observations, Capability Working Groups, which solve particular technical challenges, and a Data Management Sub-Committee, which coordinates data sharing systems for polar and oceanographic data centres (Newman et al., 2019). In 2017, SOOS collaborated with the European Marine Observation and Data Network Physics group to create SOOSmap, a portal of standardised datasets for waters south of 40 • S. SOOS, SCADM, and the Arctic Data Committee are collaborating to design a bestpractice approach to implementing schema.org for discovery metadata for all disciplines, with the goal of establishing singlewindow dataset search tools.

BUILDING THE SYSTEM Monitoring
Logistical challenges inherent to biological sampling in the Southern Ocean have resulted in a strong bias in the distribution of sampling locations. Due to unfavourable conditions during the Antarctic winter, including extended darkness and sea ice cover, sampling is mostly undertaken during summer. Even in summer, high sea ice concentrations are a limiting factor, with sampling gaps corresponding to areas with sea ice concentrations above 20% (Griffiths et al., 2014). Furthermore, sampling is often concentrated around continental shelves, islands, research stations, and the logistical routes to access them.
Primary biodiversity data about the Antarctic ecosystem can originate from different sampling schemes with different geographic, taxonomic and temporal coverage resulting in five broad categories: limited monitoring data, extensive and intensive monitoring schemes, ecological field studies, and remote sensing (Proença et al., 2017). In addition, an increasing number of automated detection mechanisms have greatly benefited from technological developments in recent decades, such as camera traps, biotelemetry, biologgers, and biogeochemical Argo floats and various automated vehicles, such as automated underwater vehicles or unmanned aerial vehicles. These technologies minimise the environmental impact of research and allow sampling of various kinds of high definition still and moving images, sound, environmental DNA (e-DNA) or biogeochemical measurements (Miloslavich et al., 2018;Canonico et al., 2019).
Extensive monitoring schemes in the Southern Ocean are limited, and most of them are usually focused on specific regions. As such, they may not be representative of ecosystems in other areas of the Southern Ocean. In the Peninsula region, there are roughly ten major sustained research efforts, including the United States Antarctic Marine Living Resources and Long Term Ecological Research Network programs which commenced in 1986 and 1990, respectively. These programs, along with long term operations by Chile, Argentina, United Kingdom, China, Poland, and others, have lent insight into comprehensive knowledge on productivity, zooplankton, benthic community, birds, marine mammals, and physical parameters regarding sea ice and temperature (see e.g., Henley et al., 2019 and references within). They also tend to happen on timescales where any follow-up sampling may be years later. One notable exception of a long term, extensive monitoring scheme is the SCAR Southern Ocean Continuous Plankton Recorder Survey was established in 1991 by the Australian Antarctic Division to map the spatialtemporal patterns of plankton biodiversity and use the sensitivity of plankton to environmental change as early warning indicators of the health of the Southern Ocean (Hosie et al., 2003;Pinkerton et al., 2020). The current dataset comprises over 47,600 segments, each representing the laboratory analysis of zooplankton samples within water filtered by a Continuous Plankton Recorder while it travelled approximately five nautical miles.
CCAMLR established its Ecosystem Monitoring Program (CEMP) in 1989 to "detect and record significant changes in critical components of the marine ecosystem" and "distinguish between changes due to harvesting of commercial species and changes due to environmental variability" (CCAMLR). 3 The program focuses on marine predator species that rely on fish or krill resources and are used as indicators for ecosystem changes: Adélie (Pygoscelis adeliae), chinstrap (P. antarcticus), gentoo (P. papua) and macaroni penguin (Eudyptes chrysolophus), black-browed albatross (Thalassarche melanophris), Antarctic (Thalassoica antarctica) and cape petrel (Daption capense), and Antarctic fur seal (Arctocephalus gazella; CCAMLR).
Another data collection method is utilised by MEOP (Marine Mammals Exploring the Oceans Pole to Pole) and other projects. The MEOP consortium consists of several international initiatives that started as part of the International Polar Year in 2008. Marine mammals, e.g., elephant seals, are equipped with CTD sensors and collect oceanographic data on their foraging trips around Antarctica (Treasure et al., 2017). Data is publicly available under. 4 Tracking data from these and other studies were included in the Retrospective Analysis of Antarctic tracking data (RAATD) -the first analysis of circum-Antarctic tracking data that was executed under the auspices of SCAR (Hindell et al., 2020;Ropert-Coudert et al., 2020).
Satellites can measure phytoplankton productivity and biomass and even roughly estimate functional composition based on reflectance spectra that contain the characteristic absorption, scattering, and fluorescence signatures of major algal groups (Muller-Karger et al., 2018a). Although some satellite products of continental Antarctica are available at resolutions of tens of metres or finer, satellite-derived data for the Southern Ocean tends to be much coarser in spatial resolution. Varying daylight hours and polar winter, floating sea ice, and cloud cover also limit the seasonal resolution and geographic distribution of data. Satellites cannot provide a depth-integrated view of the ocean as they only observe surface waters. Deep chlorophyll maxima, as the name suggests, are located in the subsurface, and so open ocean regions typically appear low in satellite-derived chlorophyll (Blondeau-Patissier et al., 2014). Satellite-based instruments are also limited in the aspects of the ecosystem that are directly observable. Although satellites are able to cover vast areas and obtain near real-time information on coarse phytoplankton patterns, they are currently not sufficient for a continuous and fine-scale assessment of the Southern Ocean ecosystem.
As an extension of the existing Argo programme, a global network of biogeochemical Argo floats collect vertical water profiles of oxygen, nitrate, pH, chlorophyll-a concentration, suspended particles, and downwelling irradiance (Organelli et al., 2017). In the Southern Ocean, several international programmes regularly deploy new floats, including the Southern Ocean and Climate Field Studies with Innovative Tools (France) and the Southern Ocean and Carbon and Climate Observations and Modeling (United States) projects.
Most of the biological observations available for the Southern Ocean stem from commercial fishery activities and scientific expeditions. In 1992, CCAMLR adopted a Scheme of International Scientific Observation which established a framework for independent fisheries observers to collect data aboard commercial fishing vessels (2020). 5 All toothfish and icefish fisheries require 100% observer coverage and krill fisheries require 50% observer coverage. Observers collect data on catch composition, biological measurements of target and by-catch species (e.g., length), gear configuration, any incidental mortality of birds or mammals, and any indications of vulnerable marine ecosystems (e.g., through coral pieces found on fishing gear). Observers also collect samples, such as otoliths, to use in later age and growth studies. Further, following a standardised tagging protocol (see text footnote 5, 2020), observers assist in deploying tags on fish and skates and recording information on tag recaptures -this is essential given that CCAMLR uses these tag and recapture data to inform stock assessments for toothfish. The data collected by observers feed directly into knowledge development and management products (see below) but are not publicly available.

Protocols
With the establishment of CCAMLR, it was realised that to effectively regulate the harvesting of Antarctic living marine resources, the effect of such harvesting on species would have to be monitored. The species of primary interest are those which prey on the commercially harvested species (currently Antarctic krill Euphausia superba, toothfishes Dissostichus eleginoides and D. mawsoni; and mackerel icefish Champsocephalus gunnari), such as birds and seals. The Working Group on Ecosystem Monitoring and Management (WG-EMM) is responsible for the design and coordination of the monitoring programme and the analysis and interpretation of the data arising from it. Since the establishment of CEMP standard methods in 1987, CCAMLR has collected data from over 50 combinations of sites, species, and parameters. At least eight members are currently involved in acquiring data. For some series, data are available from the late 1950s, but most data series start in the mid-1980s when CEMP was initiated. In August 1997, a new edition of the CEMP Standard Methods was produced following substantial revision of most methods and the adoption of a number of new standard 5 www.ccamlr.org methods. It included observation protocols and techniques, as well as a set of reference materials.
Scientific expeditions in the Southern Ocean are often ecological field studies; they address specific scientific questions and often have specific sampling schemes; it may take several years or even decades before an area is visited again. Such oneoff expeditions result in limited comparability, but a few major multi-national scientific programmes have tried to standardise survey methods. These include the BIOMASS project, which collected comparable data during 34 cruises involving 13 countries and three field experiments (El-Sayed, 1991), and CAML, which suggested a number of standardised protocols that all 18 participating expeditions agreed to for all habitat types and biological realms (De Broyer and Koubbi, 2014).

(Biological Data) Standards
Standards provide one way of organising information across projects and time, making data more FAIR. For biological data, the Darwin Core standard has become one of the most widely used, and is implemented by OBIS, GBIF, and others. Darwin Core was initially developed for museum collection data but has since expanded to document a range of biodiversity data from metagenomics to sampling event and species abundance metrics (Wieczorek et al., 2012). The Darwin Core standard and other biological data standards are managed by the nonprofit Biodiversity Information Standards consortium (through its Taxonomic Databases Working Group), in which various stakeholders are represented.
The Darwin Core standard consists of a set of well-defined terms that refer to different aspects of the data, such as the identity of an observed taxon (scientificName), the location of the observation (decimalLatitude, decimalLongitude, locality,. . . ), and the time (year, month, day, and time). Other terms provide further details and metadata on observations, sampling events, or sampled material. The format for datasets that uses Darwin Core terms is called "Darwin Core Archive, " which is a selfcontained zipped folder that consists of a set of text files (CSV tables) that hold the data, a simple XML document that describes how these files are organised, and a metadata XML file following the Ecological Metadata Language (EML). The data CSV files in a Darwin Core Archive are linked in a star-like manner, with one core data file surrounded by any number of "extensions." The core data may consist of species occurrences, sampling events, checklists, or material samples. At the same time, the extensions are used to provide additional information or measurements that are usually specific to a study, sampling campaign, or experimental set-up.
OBIS has pioneered an extension to Darwin Core that records measurements or facts about the species occurrences as well as about the events in which the occurrences were observed (De Pooter et al., 2017). With this extension, data managers can document, in a standardised way, information like catch per unit effort or percentage cover, as well as abiotic measurements like temperature and salinity. Moreover, this extension includes fields for documenting links to vocabularies helping to disambiguate text descriptions of sampling methods or aggregations.
The use of different sampling methods and gear significantly affect the collection of species (and reported abundance). This means that one single method is not sufficient to sample all biodiversity and also that results need to be integrated. For data to be useful in a broader context, it is necessary to document and classify where and how they are sampled, which sampling gear is used, and how the data are processed.
Nucleotide sequence data is also commonly used for biodiversity assessments, especially for studying microorganisms or when implementing non-invasive species detection strategies based on e-DNA. There is a well-established practice of making the sequence data available (FASTA or FASTQ) in repositories under the International Nucleotide Sequence Database Collaboration. However, while the Genetics Standards Consortium developed a standard to describe associated metadata and environmental measurements (MIxS; Yilmaz et al., 2011), this information is often lacking.
Since nucleotide and biodiversity data standards evolved in different communities, they are currently not fully compatible. The Darwin Core standard is historically based on Linnaean taxonomy. Microbial or e-DNA biodiversity currently includes taxa known only from their sequences and have no formal morphological description and Linnaean name. Therefore, the integration of this molecular-based data is non-trivial. With the advent of event-core as part of Darwin Core, the integration of non-Linnaean biodiversity data becomes more feasible. There is currently a strong ongoing interest to integrate such data (e.g., OBIS and GBIF). Furthermore, efforts are underway to harmonise standards, e.g., by the Genomic Biodiversity Working Group created by the TDWG and the Genetics Standards Consortium to harmonise Darwin Core and MIxS terms.
More recently, in the United Nations Decade on Ocean Science framework, the Oceans Best Practices System aims to enhance the management of methods across research and support the development of best practices. 6 Typically, regular meetings and working groups, combined with raising issues to provide feedback, are the primary mechanism by which community consensus is built around adopting and developing data standards and best practices. In turn, such standards are fundamental to international collaboration scientific projects or biodiversity monitoring schemes, where data needs to be interoperable between research groups, observation stations and time. However, Tanhua et al. (2019) note that many best practices in ocean data management are currently poorly defined and that much work is needed to identify and promote the adoption of best practices.

Accessibility
The vast size of the Southern Ocean, its remoteness relative to human populations, the extreme weather conditions, and the costs of operating research vessels all strongly constrain our ability to regularly sample biota. Consequently, access to data from expeditions and chemo-physical observation systems is vital to cover sufficient space and time for ecosystem assessments. The need for open and accessible data is mandated by the Antarctic Treaty System, which has led to several initiatives to promote FAIR data and tools to increase the discoverability and accessibility of open data. The largest resource for finding Antarctic metadata and data is the Antarctic Master Directory, curated by the Standing Committee on Antarctic Data Management of SCAR.
For CCAMLR's Convention area, all summary catch and fishing effort statistics since 1970 are made publicly available in the annual Statistical Bulletins. In addition, CCAMLR receives data from scientific observers onboard fishing vessels, research survey data, and the CCAMLR Ecosystem Monitoring Program (CEMP; Caccavo et al., 2021). In contrast to the summary statistics, these high-resolution fishery and research data are not made available to those outside of CCAMLR due to commercial confidentiality concerns by fishing members. However, permission to access this data can be requested in accordance with the Rules for Access and Use of CCAMLR Data.
A direct access point for finding Antarctic and Southern Ocean biodiversity data from research expeditions, such as species occurrence observations or nucleotide sequence data, is the SCAR Biodiversity.aq portal. Biodiversity.aq is part of the European LifeWatch ERIC infrastructure and is the regional thematic node of OBIS and GBIF, to which data are published and made discoverable. Biodiversity data can also be downloaded through Application Programming Interfaces (APIs) such as R packages, like robis (Provoost et al., 2019), rgbif (Chamberlain and Boettiger, 2017), and spocc (Chamberlain, 2021).
SOOSmap 7 is a web portal that makes standardised environmental datasets available for waters south of 40 • S. For an overview of available datasets see Table 1. It collates physical, chemical, and biological observations into a single portal to maximise interdisciplinary data discovery. This involves both developing new data-sharing connections for widely distributed data types (e.g., oceanographic mooring data is held in dozens of separate data centres, and stored in multiple standard and non-standard formats) and, where no standards have yet been agreed to, developing data-sharing tools that allow a standard to be developed (e.g., data from the SCAR Plastic in Polar Environments Action Group is served to SOOSmap via a simple online spreadsheet).

Modelling and Analysis
As discussed earlier, and despite continuing increases in data collection, publishing efforts, and the development of new data capture technologies, data support for marine ecosystem assessments in the vast and remote Southern Ocean remains challenging. Observations are often sparse or biased in aspects such as spatial (horizontal and vertical), temporal (seasonal, interannual), taxonomic, and trophic level coverage. Data limitations, coupled with the increasing need for sophisticated assessment outputs, means that moving from marine ecosystem data to an ecosystem assessment increasingly involves some form of modelling. Here we use the term "model" in the inclusive 7 http://soosmap.aq/ sense of Constable et al. (2016) to include conceptual, qualitative, statistical, empirical, and dynamic mathematical models.
Models have differing utility depending on the type and details of the model but may assist in processes such as identifying change within a set of observations, identifying causal mechanisms (attribution of change), and testing our understanding of systems. A model's ability to make inferences can also help alleviate issues of data coverage. This can take the form of interpolation, where the model is used to make estimates within the domain of the existing data, effectively filling in gaps. This can include the provision of information regarding ecosystem processes that cannot easily be directly observed. Inferences can also be predictive; applied to a broader domain than encompassed by existing data. Predictions can extrapolate to new geographic areas, different time periods, and operational regimes representing perturbations beyond previously observed states. Prediction can be used for forecasting or hindcasting ecosystem properties and behaviour over time, evaluating scenarios (e.g., management interventions, climate change), and exploring the consequences of observed change.
A cornerstone of the scientific method is transparency, reproducibility of scientific results and provenance (Ma et al., 2014;Toelch and Ostwald, 2018). The critical importance of modelling and analysis to ecosystem assessments means that any informatics system must support the integration of data with analysis pipelines. Furthermore, internet-based technologies have not only led to an increase in data sources and data volumes but also increased availability of sophisticated analytical methods. As such, the limits of the traditional reporting mechanisms of material and methods have been surpassed and require a new approach to describe the what, how, when, where and why of an analysis to ensure the complete reproducibility of a scientific analysis. To achieve this, not only the data needs to be open and FAIR but also the tools that have been used to generate the results. Notably, efforts to this effect must support the development of appropriate software tools (e.g., Borregaard and Hart, 2016) and also strive to build supportive and inclusive communities that empower users to apply and build on these tools (Boettiger et al., 2015).

Knowledge
As noted above, Southern Ocean ecological monitoring -and thus knowledge -is concentrated at specific locales (e.g., the Western Antarctic Peninsula) and/or focused on specific species (e.g., commercially valuable species). Due to these long-term comprehensive research efforts, climate change trends are most evident in the western Antarctic Peninsula (e.g., Hendry et al., 2018). However, these changes and related impacts on the ecosystem remain complex and difficult to disentangle from other human impacts, such as fishing and tourism, which are also concentrated in the region (Hogg et al., 2020).
Due to ongoing commercial fisheries and scientific research in the Southern Ocean, extensive data has been collected regarding the basic life history of toothfish and krill. The above noted US Antarctic Marine Living Resources program has led long-term surveys of krill, lending insight into age, growth and recruitment (e.g., Kinzey et al., 2019), as have other nations. For toothfish, data collection has mainly been fisheries-dependent, yet has also lent insight into age, growth, maturity, and other basic life history parameters (e.g., Hanchet et al., 2015). Nevertheless, even in these well-studied species, major gaps and uncertainties remain. For toothfish, uncertainties around connectivity and ecosystem dynamics remain, including impacts on predator and prey populations (e.g., Abrams et al., 2016). In addition, significant gaps remain in the status of bycatch species caught in toothfish fisheries, including for macrourids, skates, and sharks (SC-CAMLR, 2018), and in understanding the role of toothfish in the diets of killer whales (Pitman et al., 2018) and Weddell seals (Salas et al., 2017).
Overall, with limited exceptions noted above, considerable gaps exist in our data collection and thus in our knowledge of Antarctic marine systems. For example, a baseline estimate of Antarctic marine biodiversity has yet to be completed (Chown et al., 2017;Chown and Brooks, 2019). Further, even when data regarding status and trends exists, the drivers behind these trends can be unclear. For example, Adélie penguin (Pygoscelis adeliae) populations are in some places stable while in other places they are declining (e.g., parts of the western Antarctica Peninsula) or increasing (e.g., in the Ross Sea; Che-Castaldo et al., 2017). The drivers behind this variation are not well understood but are likely a combination of climate change impacts, commercial fishing, and potentially historical whaling and the resulting prey release (e.g., Ainley et al., 2017;Hinke et al., 2017). The status of pack-ice seals Ross (Ommatophoca rossii), crabeater (Lobodon carcinophaga), Weddell (Leptonychotes weddellii), and leopard seal (Hydrurga leptonyx) remains understudied and largely unknown (e.g., Southwell et al., 2012). The status of Southern Ocean whales, many of which were heavily exploited in the 20th century, also remains uncertain. Some species are estimated to be recovering (e.g., humpback whales, Megaptera novaeangliae), others lack data sufficient for assessment (e.g., fin whales, Balaenoptera physalus), and others are still considered critically endangered (e.g., Antarctic blue whale, B. musculus intermedia; Thomas et al., 2016).
New innovations and applications are lending knowledge and insights into Antarctic marine systems. For example, satellite remote sensing has been incredibly useful at identifying, and estimating the distribution and abundance of penguins (Fretwell et al., 2012), and seals (LaRue et al., 2011). These efforts may provide cost-effective innovations for monitoring in the future, including for marine protected areas (MPAs; LaRue et al., Submitted). Acoustics are also offering new insights with potential applications for population monitoring for krill (Reiss et al., 2020) and whales (Miller and Miller, 2018). Southern Ocean applications of computational approaches and modelling are also providing new insights. For example, applications of Bayesian methods reveal insight into circumpolar abundance trends for Adélie penguins (Che-Castaldo et al., 2017) and earth system models are beginning to reveal longer-term changes in krill growth throughout the Southern Ocean (Sylvester et al., 2021). Applications to better understand the status and the drivers of Southern Ocean species and ecosystems are urgently needed, especially in light of the growing human impacts in the Southern Ocean.
The knowledge gaps highlighted above do not reduce the urgency for management actions that will protect and sustain Southern Ocean ecosystems into the future. Such actions must be based on the best available science, accommodating the uncertainties associated with current knowledge. Uncertainty will always be present to some degree and does not necessarily preclude action from being undertaken. Management processes must engage with uncertainty and seek to undertake robust actions to it wherever possible (Press, 2021). Future priorities, therefore, include not only an increased capacity for knowledge generation and delivery of products into management but also improvements in methods and applications of decision making under uncertainty. Innovations in encapsulating knowledge into products that are usable by managers are also urgently needed.

Products
Knowledge and data products should help inform management and guide the implementation of monitoring protocols. Beyond FAIR data sharing principles, best practices exist for action ecology and actionable science (Beier et al., 2017). Action ecology calls for transdisciplinary input, closing the gap between findings and implementation, using the best available technology, and providing policy-ready recommendations (White et al., 2015). Actionable science can be defined as the data, analyses, projections or tools needed to support decision-making (Beier et al., 2017). The key principles of actionable science comprise credibility (i.e., sound science), saliency (i.e., relevance to management), and legitimacy (i.e., inclusive of stakeholder objectives and values (Cash et al., 2006;Beier et al., 2017)). Coproduction of actionable science involves iterative collaboration between scientists, managers, and other stakeholders and offers a model for effective collaboration when managers (e.g., CCAMLR or Antarctic Treaty Parties) need to base multi-faceted decisions on complex scientific information (Sylvester and Brooks, 2020). These best practices for FAIR principles, action ecology, and actionable, co-produced science may help in achieving effective products for decision-making but have not necessarily been used to their full extent in Southern Ocean applications.
CCAMLR and the Antarctic Treaty Consultative Parties (with advice from the Committee on Environmental Protection) manage the Southern Ocean, with the exception of some subantarctic islands under national jurisdiction. The scientists in these fora primarily either work for their national governments or serve on a delegation to CCAMLR or the ATCM. Further, some data-especially data collected through fisheries managementremains out of the public domain yet is often required for analyses and products such as stock assessments and independent reviews that are key to management processes. These arrangements can be challenging for scientists external to their national delegations but who produce valuable knowledge and wish to engage in co-production, thus limiting the scope of input to these management bodies.
One potential avenue for the ATCM and CCAMLR to increase their capacity for management-ready science products could be to more actively facilitate the engagement of a greater cross-section of the scientific community. For example, in the spirit of FAIR principles, Fisheries data should be linked to detailed, findable metadata and, wherever feasible, made more easily accessible for analysis by independent scientists. SCAR provides an illustration of the value of this approach as, especially in the ATCM, they are often invited to present data and products to help inform specific management. For example, SCAR provided an in-depth report regarding the impact of drones on wildlife which helped inform the adoption of ATCM Resolution 4 (ATCM, 2018), "Environmental Guidelines for operation of Remotely Piloted Aircraft Systems in Antarctica." SCAR also hosts the Antarctic Environments Portal, 8 which provides concise, technically accurate, politically neutral summaries of the current state of knowledge that are accessible and understandable to policymakers. These examples illustrate SCAR's role as a bridging organisation, delivering science-based products in the form of analyses, white papers, or others.
An end-to-end system in the Southern Ocean can further create more opportunities for independent science (and scientists) to feed into their management process. While scientists external to these bodies can publish papers, data sets, or other products, they are often not received or used in management. Currently, The participants in these fora primarily either work for their national governments or serve on a delegation to CCAMLR or the ATCM, and so an independent scientist could not present a paper for consideration by these bodies other than through a national government. This process could be more open, transparent and streamlined. Fisheries data, in the spirit of FAIR principles, should be linked to detailed, findable metadata and, wherever feasible, made more easily accessible for analysis by independent scientists. Secondly, independent scientists working with Southern Ocean data could work to follow best practices for co-production of actionable science. This would result in the creation of credible science products that are salient to the needs of the managers, and legitimately include stakeholder views.
The recent proposal for a marine protected area in the western Antarctic Peninsula provides a similar example. The preparation of that proposal, including the creation of data layers and products, was conducted mainly by national scientists but did include input from independent scientists and other stakeholders (Sylvester and Brooks, 2020).

CONCLUSION
Biodiversity data is a critical aspect of any integrated Marine Ecosystem assessment. Within the Southern Ocean, limited availability of biodiversity data will remain an issue as the region is remote and is characterised by harsh environmental conditions. This necessitates an internationally collaborative approach that is, fortunately, ingrained within the Antarctic Treaty. International initiatives such as CAML and the 4th IPY have contributed to sharing protocols and applying data standards, resulting in the mobilisation of data with global networks like OBIS and GBIF. During CAML and IPY, there was a strong focus on historical data and the development of a baseline assessment of the Southern Ocean (De Broyer and Koubbi, 2014). The wealth and volume of new data associated with new technology represent a huge challenge with increased demand on informatics, requiring high capacity computing capabilities.
Data infrastructures such as SCAR-MarBIN/SCAR Antarctic Biodiversity Portal, the Antarctic Master Directory, SOOS, and SOOSmap make Southern Ocean data findable and accessible and are well linked to the relevant international data communities. However, ensuring that data are interoperable and reusable remains a complex task. A key aspect to achieve this will be the application of open science principles to the whole research cycle. Furthermore it requires the engagement of all stakeholders and most noteworthy policymakers (Box 1).
The application of EBVs, EOVs, or eEOVs represents a strong way forward in the monitoring and assessment of Southern Ocean Ecosystems. However, these Essential Variables will require prioritising specific variables and the data to be collected at the level of CCAMLR. This will require close collaboration between policymakers and researchers and the application of the action ecology to get the scientific input we need for the Southern Ocean we want.

AUTHOR CONTRIBUTIONS
AV coordinated the text contributions, wrote the introductory, and finalised the manuscript. PB, CB, HG, SH, BR, and MS contributed to the writing and editing of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
AV and MS were funded by the Belgian Science Policy Office (BELSPO, contract n • FR/36/AN1/AntaBIS) in the Framework of EU-Lifewatch. HG funded by the British Antarctic Survey and NERC. SH was supported by the Australian Research Council through a Laureate awarded to Philip W. Boyd, IMAS, UTAS, Australia (FL160100131). CB receives support through the Pew Charitable Trusts and the National Science Foundation.