Integrating Coral Restoration Data With a Novel Coral Sample Registry

In the past decade, the field of coral reef restoration has experienced a proliferation of data detailing the source, genetics, and performance of coral strains used in research and restoration. Resource managers track the multitude of permits, species, restoration locations, and performance across multiple stakeholders while researchers generate large data sets and data pipelines detailing the genetic, genomic, and phenotypic variants of corals. Restoration practitioners, in turn, maintain records on fragment collection, genet performance, outplanting location and survivorship. While each data set is important in its own right, collectively they can provide deeper insights into coral biology and better guide coral restoration endeavors – unfortunately, current data sets are siloed with limited ability to cross-mine information for deeper insights and hypothesis testing. Herein we present the Coral Sample Registry (CSR), an online resource that establishes the first step in integrating diverse coral restoration data sets. Developed in collaboration with academia, management agencies, and restoration practitioners in the South Florida area, the CSR centralizes information on sample collection events by issuing a unique accession number to each entry. Accession numbers can then be incorporated into existing and future data structures. Each accession number is unique and corresponds to a specific collection event of coral tissue, whether for research, archiving, or restoration purposes. As such the accession number can serve as the key to unlock the diversity of information related to that sample’s provenance and characteristics across any and all data structures that include the accession number field. The CSR is open-source and freely available to users, designed to be suitable for all coral species in all geographic regions. Our goal is that this resource will be adopted by researchers, restoration practitioners, and managers to efficiently track coral samples through all data structures and thus enable the unlocking of a broader array of insights.

In the past decade, the field of coral reef restoration has experienced a proliferation of data detailing the source, genetics, and performance of coral strains used in research and restoration. Resource managers track the multitude of permits, species, restoration locations, and performance across multiple stakeholders while researchers generate large data sets and data pipelines detailing the genetic, genomic, and phenotypic variants of corals. Restoration practitioners, in turn, maintain records on fragment collection, genet performance, outplanting location and survivorship. While each data set is important in its own right, collectively they can provide deeper insights into coral biology and better guide coral restoration endeavors -unfortunately, current data sets are siloed with limited ability to cross-mine information for deeper insights and hypothesis testing. Herein we present the Coral Sample Registry (CSR), an online resource that establishes the first step in integrating diverse coral restoration data sets. Developed in collaboration with academia, management agencies, and restoration practitioners in the South Florida area, the CSR centralizes information on sample collection events by issuing a unique accession number to each entry. Accession numbers can then be incorporated into existing and future data structures. Each accession number is unique and corresponds to a specific collection event of coral tissue, whether for research, archiving, or restoration purposes. As such the accession number can serve as the key to unlock the diversity of information related to that sample's provenance and characteristics across any and all data structures that include the accession number field. The CSR is open-source and freely available to users, designed to be suitable for all coral species in all geographic regions. Our goal is that this resource will be adopted by researchers, restoration practitioners, and managers to efficiently track coral samples through all data structures and thus enable the unlocking of a broader array of insights.

INTRODUCTION
The rapid decline in coral cover and health around the world is due to local, regional, and global threats (Hughes et al., 2018). The factors responsible for coral reef decline include climate change impacts (Hoegh-Guldberg et al., 2007;Carpenter et al., 2008;Doney et al., 2012) that cause coral bleaching and mortality (Hoegh-Guldberg, 1999;Eakin et al., 2010) and coral diseases (Aronson and Precht, 2001;Bruno et al., 2007). Complex interactions among herbivores, specifically fishes and urchins, seaweeds, and corals also impact the condition of coral reefs (Hixon, 2015). Proximity to large human populations is related to decline, where development, pollution, and overfishing can impact coral reef habitats (Hughes and Connell, 1999;Fabricius, 2005;Pendleton et al., 2016). Without substantial course alterations these stressors are expected to continue unabated, further degrading tropical coral reefs. This outcome would mean catastrophic loss of marine species, potential loss of tropical coral reef ecosystems, reduced food security for a large portion of the world's population, international security issues, risks to fresh water supplies, and increased coastal flooding. Consequently, protecting and restoring the world's tropical coral reefs has become increasingly important to both public and private interests across the global community broadly (Hein et al., 2021).
To successfully address the long-term stability of coral reef ecosystems, three courses of action are required: first, mitigation of the stressors leading to coral mortality; second, maintaining and expanding the current populations of reef-building corals; and third, implementing methods to help corals adapt to evolving environmental conditions (Duarte et al., 2020;Hein et al., 2020;Vardi et al. in review). Tackling each of these is a major undertaking requiring a multi-disciplinary approach and extensive coordination between research, resource management, and restoration agencies. Consequently, informal knowledge sharing organizations have been formed, such as the Coral Restoration Consortium (2021;Vardi et al. in review), the International Coral Reef Initiative (2021), and other large-scale, centrally coordinated projects (e.g., Reef Plan 2050, Australia; Mission: Iconic Reefs, FL, United States; Reefense, United States).
Coral restoration, defined here as active interventions including coral population management, propagation, outplanting, and research, drives specific courses of action to counteract threats and maintain and expand coral populations. This broad field is the product of integrating across resource management agencies, academic research groups, and restoration practitioners, each with distinct yet partially overlapping interests (Figure 1). Resource management agencies, charged with the protection and regulation of coral species and their environs, coordinate, permit, and track activities relative to an overall management plan in accordance with the policies of their sovereignty. They are concerned with the collection from wild colonies, properties of wild colonies (e.g., disease presence), distribution records, utilization of collected samples, survivorship, and so forth. Academic researchers inform specific aspects of the genomics, physiology, population structure and ecology of specific corals (species or strains), reef community interactions, or geographic regions. They are increasingly investigating the physiological and genomic mechanisms that give rise to differences in phenotypic response based on genetics and genome by environment interactions, a field that is increasing as identification of factors for resiliency become more important. Restoration practitioners are actively collecting, growing, and transplanting corals to degraded reefs. They track the quantities and relative performance of corals across nursery and outplant settings. For each of these groups, the tracking of individual samples and the corresponding data are critically important. With increases in restoration activities and research, the amount of data generated is rapidly proliferating. This information landscape is further complicated by the clonal nature of corals which is leveraged in many coral propagation and restoration programs wherein each de novo collection event can result in a clonal lineage distributed widely among programs, habitats, and geography.
For each of these coral restoration stakeholder groups, the fundamental unit being tracked is a unique instance of an observed coral -a colony of a certain genotype -identified or collected at a specific place and time. While each group is generating and tracking important information about the biology and restoration utility of specific strains of coral, this information is often isolated in idiosyncratic data storage systems that are agency-or project-specific. For example, the few restoration groups based in the Florida Keys each maintain their own data structures detailing collection, nursery, outplant, and performance for any given genotype. While coral fragment swaps between groups do occur, these data are rarely combined into one central database. Rather, shared data are duplicated across groups and remains siloed.
Presently, access to information across all systems is not possible due to the lack of standard data fields, structure, and storage capacity, making it difficult to leverage the collective knowledge across groups for informed adaptive management decisions. Figure 2 provides a partial list of data structures generated within just a small geographic range of groups working toward management, research, and restoration goals for corals along the Florida Reef Tract. In addition to the physical isolation of datasets, issues of data integrity, disparate naming conventions, and even knowledge of what information is available confound the problem. Thus, access and adjudication issues slow the spread of knowledge even in those cases where there is willingness to invest in cross-platform integration.
Adaptive management for coral reef restoration initiatives will depend on the ability to access the broadest collection of data possible, as efficiently and quickly as possible. To that end, information associated with specific coral restoration activities must be accessible across organizations managing, researching, and working with those strains. As the first step in addressing this problem, herein we present the Coral Sample Registry, a convenient, web-accessible centralized system whereby coral fragments used for management, research, or restoration can be registered at the time of collection and issued a unique identifier, the Accession Number. The accession number provides a common field which can be used to standardize the way various groups communicate about the same information and FIGURE 1 | Data types collected and used in coral restoration efforts across Restoration Practitioners, Academic Researchers, and Resource Management Agencies. Each of these broad categories may represent multiple groups working concurrently. For example, in Florida, resource management agencies could include NOAA Restoration Center, Florida Keys National Marine Sanctuary, Florida Fish and Wildlife Conservation Commission, Army Corps of Engineers, and the Department of Environmental Protection. * Indicates data types that may change between groups for the same sample. ∧ Indicates data types where one group may have multiple observations over time for a single sample. is associated with the sample thereafter across any and all data structures. The system is designed to be simple to use, independent of coral species or geographic location, and provide multiple means for entering and accessing information. The Coral Sample Registry is not intended to directly link the different data repositories, but to provide a standardized key corresponding to unique coral samples that can then be used to unlock the information across different data repositories.

MATERIALS AND METHODS
The concept for the Coral Sample Registry (CSR) was developed at an initial meeting with key stakeholders at the Reef Futures 2018 conference held in Key Largo, Florida. Representatives included restoration-practitioner groups, academic researchers, and United States resource management agencies at the federal and state levels, participating to discuss how to more efficiently access the various data streams being generated in order to better inform adaptive management of the restoration efforts occurring across the Florida Reef Tract. Four principles were agreed upon to guide this work. First, any solution should be accessible regardless of coral species or geographic location. Second, to the extent possible, best-practices for data-repository construction and management should be employed. Third, any solution should be easy to use and not impair the ongoing data management activities of existing stakeholders. Fourth, given the landscape of complex and diverse data structures already in existence, a more generalized solution was preferable to a specific one.
Based on the principles outlined, we determined the simplest and most effective solution was a system whereby individual coral samples could be assigned a unique identifier (hereafter an accession number) that could be incorporated into existing data structures. In this manner, the accession number would serve as a hashtag allowing information in different data repositories to ultimately be integrated. Moreover, it would require minimal modification to existing data structures and no need to transfer or duplicate current data to a new platform. The CSR was designed with this narrow scope in mind: to be a registry of coral samples used in various restoration activities and to assign each sample a unique accession number. It is not intended to be an aggregator of all information or even to directly link existing data structures; it is intended to provide a common key that can be integrated into existing data structures to allow cross-linking in the future based on needs.
With this framework in mind, key aspects of the CSR are described below. Additional details can be found in the User's Guide document associated with the website.

Scope
The Coral Sample Registry has been developed to accommodate corals of any species in any geographic region. Although the project began with a focus on South Florida restoration, the final product is suitable for use globally.

Defining a Unique Sample for Assignment of Accession Numbers
A unique sample (i.e., base unit) is defined as the unique combination of six fields (defined later in the text): Sample Type, Local Sample Name, Collection Date, Genus, Species, and Organization. This represents the base unit to which an accession number is assigned.
The base unit defined in this manner differentiates collection of samples from the same wild colony at two different time points (or agencies), each sample receiving a unique accession number. This avoids the assumption that local samples, which might be subjected to different post-collection analysis or subsampling, be artificially conflated. The system provides sufficient flexibility to allow wild collections or sexual crosses to be registered and receive accession numbers.
It is important to realize the base unit is a collection event and may not equate to a unique genotype. We believe that this is an important and powerful feature of the CSR. It allows a sample (of a putative genotype) to be tracked prior to any investment in genotyping. The dominant practice outside of research is to archive collected samples prior to genotyping, and many small restoration and management practitioners around the world may never have their samples genotyped (pers obs.). If samples are subsequently found to be the same genotype based on a common methodology, then this information can be tracked and adjudicated outside of the CSR, such as in the data structure associated with the genotyping method. Following best practice, we have intentionally avoided allowing the post-collection association of a "genotype" to registered samples in order to avoid conflicting sequencing methods or altering original entries. We believe deconfliction of potential synonymous genotypes is best done outside of the registry as these classifications may change as techniques and methodologies evolve.

Infrastructure
Amazon Web Services (AWS) is used for hosting the Coral Sample Registry website and database. AWS, which is a leader in cloud computing, provides security, high availability, and reliability for the Coral Sample Registry 24/7 across the globe.

Defined Users
Users of the CSR are classified as Registered Users. Registered Users can enter new sample information and access the full data repository. Registered users are required to apply for access using an email domain corresponding to their parent institution (e.g., @noaa.gov or @coralrestoration.org).

Data Entry and Data Access
Registered Users have two options when inputting data. Samples can be entered: (a) individually, entering each field through the web interface; or (b) via a bulk upload option using a pre-existing spreadsheet, for which a template is available. The bulk upload option recognizes errors and exports an error file for modification and re-upload, flags and prevents duplicate uploads, and offers an immediate export of newly added accession numbers for incorporation into the originator's databases.

Editing of Previously Entered Information
Entered data can be subsequently edited, if needed, only by the Registered User who made the entry. All changes are captured in the metadata. Entries cannot be deleted; hence accession numbers will never be re-assigned to a new sample.

Data Access and Viewing
Once data are uploaded, they are visible to all Registered Users on the CSR's browser tab. Data filtering and downloads are also made possible using the CSR's browser tab, where even the entire contents of the CSR can be downloaded. Should additional data about a specific entry be of interest, user contact information is made available.
Data can be searched using any field or any combination of fields. In addition, there are a limited number of preexisting summary reports available for convenience, located under the Reports tab.

Data Integrity, Back-Up and Redundancy
The Coral Sample Registry data is stored in an isolated PostgreSQL relational database in AWS. Back-ups of the database are taken on a daily basis for disaster recovery. The data in the registry is replicated across multiple "zones" within the AWS network. This helps provide an additional layer of data redundancy in the event of a failure.

Data Fields
The following summarizes the data fields within the CSR architecture; additional details can be found in the Coral Sample Registry User's Guide, accessible online.

Accession Number
Generated field. An accession number is a randomly generated 36-digit alphanumeric string keyed using a randomized algorithm based on the server time. It is associated with a coral sampling event as described above (a unique combination of Sample Type, Local Sample Name, Collection Date, Genus, Species, and Organization fields).

Sample Type
Required field; constrained by picklist. This field consists of a limited list: the field can either state "Wild Colony" or "Sexual Recruit." A Wild Colony is defined as either a detached fragment of opportunity or a fragment taken from a wild colony. A Sexual Recruit is defined as a coral created by assisted sexual reproduction during which gametes were harvested, fertilized, and subsequently settled in a lab or nursery setting.

Latitude
Required field; limited text. Latitude corresponds to the location where the sample was obtained. Entries can be made in any of three formats (Decimal Degrees; Degrees and Decimal Minutes; or Degrees, Minutes, Seconds), though all are converted and standardized to Decimal Degrees upon successful upload. For Sexual Recruits, this refers to the location of larval settlement (i.e., nursery or lab).

Longitude
Required field; handled as per Latitude.
Frontiers in Marine Science | www.frontiersin.org Country Required field; picklist. Specifies the sovereign nation that the sample originated from. For Sexual Recruits, this field refers to the country in which the recruit was settled, thus the permitting sovereignty governing the sample's handling. This is a more consistent designation than country of larval origin since, with increasing application of cryopreservation and assisted gene flow in coral breeding, the time of collection and geographic origin of coral larvae will become increasingly complex (i.e., egg and sperm collections may come from different countries in different years).

Region
Optional field; free form text. Specification of the local region of the source sample.

Subregion
Optional field; free form text. Specification of the local subregion of the source sample.

Reef Name
Optional field; free form text. Specification of the local reef name of the source sample.

Site Name
Optional field; free form text. Specification of the local site name of the source sample, often a specific area of a local reef.

Genus
Required field; corrected free text. Entries are compared against a standardized list of coral genera and species (World Register of Marine Species) for correct spelling. Only recognized genusspecies combinations are possible: mis-matched entries are flagged for review.

Species
Required field; handled as per Genus.

Local Sample Name
Required field; free form text. This refers to the name or ID assigned to a sample by the collecting organization. This likely represents the putative genotype.

Collection Date
Required field; constrained format. For a Wild Colony, this refers to the date of collection from the wild. For a Sexual Recruit, this refers to the date of settlement.

Notes
Optional field; free form text. This field is used to incorporate pertinent information about a collection event. This includes, but is not limited to, information about the possible parents of a sexual recruit, information on the status of the parent wild colony from which a fragment was taken, or additional collection information that does not fit into the above fields.

Submitter
Generated field. Provides the name of the Registered User who made the entry, referenced from their account information.

Organization
Generated field. Provides the name of the Organization of the Registered User who made the entry, referenced from their account information.

Contact Information
Generated field. Provides the email address of the Registered User who made the entry, referenced from their account information.

DISCUSSION
Protecting coral reefs and enhancing coral populations in the face of further anthropogenic change requires deeper insights into the biology of coral species and their ecological communities. As data are generated across various fields and multiple researchers, we face the challenge of integrating this information into actionable management strategies, hopefully to outpace the loss of coral cover. Reducing the amount of time it takes for collected data to become actionable, by accessing and integrating the broadest collection of information possible, will be necessary for success. The Coral Sample Registry is designed as an essential first step toward this goal, providing a means to crossreference -and thereby access -disparate information types related to coral samples.
The creation of the CSR accomplishes four major things for the field of coral conservation. First, it establishes a single, permanent record of coral samples. Entries will not be deleted from the registry, but new ones can be added at any time, making the CSR an up-to-date database of collections made across groups, species, and regions. Currently, no data structures exist that meet this need. While powerful individual research tools, databases for genotype or restoration-nursery collection information are successful at capturing high resolution information for large numbers of corals, but do not incorporate all collected samples, only subsets that qualify (e.g., have been genetically analyzed). Second, it standardizes the minimum set of information related to coral samples, regardless of group, species or region. This information directly addresses a need outlined by the Coral Restoration Consortium 2020-2025 priorities to better define terms associated with coral restoration for improved management (Coral Restoration Consortium, 2021;Vardi et al. in review).
Third, it offers a single point of reference for where coral samples have been accessioned, thus allowing spatial gaps in collections or potential redundancies to be easily identified while drawing attention to potential collection overlaps between groups. Presently, knowledge sharing about sampling events requires intensive effort from multiple groups to maintain several databases, each updating at different times. Rather than relying only on resource management agencies to provide populationlevel metrics about coral fragments in use by all groups, the CSR offers an up-to-date structure easily accessible by all parties. Finally, as an open-access repository, it allows for all participating groups and resource management authorities to share and access information collaboratively to tackle broad problems as a unified community, rather than fractured segments.
The CSR provides a convenient method to accurately communicate among different data structures but does not guarantee mutual access or cross-platform integration of these sources. We recognize that while the CSR offers the potential for greater information integration and access, the challenge will be with its broad adoption. Key to the success of the CSR is the widespread registration of coral samples within the registry from all groups, the pairing of an accession number with how coral samples are used over time, and the inclusion of an accession number field in data structures currently in use or in development to track a fragment's origin. Adoption at each of these levels can be daunting. However, we remain optimistic given the growing desire to leverage coral-level information across all parties for both research and management.
For researchers, the ability to directly link the source of their collection material with the various attributes of individual strains is critical in elucidating genome by environment responses associated with various stress challenges. Interventions such as assisted migration, assisted gene flow, and selective breeding activities are becoming increasingly viable interventions (National Academies of Sciences Engineering and Medicine, 2019), necessitating a link between collection information, genomic, and phenotypic information of individual samples. Therefore, we actively encourage research groups to incorporate a field corresponding with the CSR accession number into their current data repositories. Through incorporation of the common accession number information, academic researchers would be able to access all collection information for samples under current study or previously studied, allowing for comparison across multiple research project-specific datasets that examine many different aspects of a particular coral strain (e.g., heat stress tolerance, disease susceptibility, growth rates). Several groups have already incorporated a blank field to be populated by the generated accession numbers such that their databases can be immediately cross-referenced using the common key provided by the CSR. Examples of these data structures are the NOAA Acropora palmata Population Management Database, Pennsylvania State University Acroporid Genotype Database, the Caribbean Coral Spawning Monitoring Database, and the NOAA AOML Coral Program physiological database in development.
For restoration practitioners, the CSR provides a free, easy to use resource to monitor their collection activities and inventory collected samples. By directly tying a collection event to an initial coral sample, a clear link is established between the collection event and the subsequent lineage of that sample through asexual reproduction, currently the dominant form of propagation for many coral restoration practitioners Hein et al., 2021). As we learn more about the phenotypic plasticity of various coral species, it is becoming clear that the variance associated with a particular genotype in different environments is complex. Therefore, understanding the survivorship and performance of the same genotype collected from different locations or times may be an important co-variance factor. Tracking these lineages for testing and observation is a growing priority, facilitated by the CSR. Swapping of coral strains among research groups and/or practitioners, as is common in areas with large restoration programs like the Florida Reef Tract, is an increasingly important means of diversifying populations. However, this beneficial practice can become problematic if there aren't sufficient controls to ensure transparent transfer of collection information, as is guaranteed via a registered accession number. Restoration practitioners would incorporate the accession number into all existing data structures, allowing for a long-term data analysis of strains outplanted across multiple years in different locations and quantities. We encourage all restoration practitioners to register all current and future samples in the CSR, especially during coral swaps between groups.
For resource managers, including governmental permitting agencies, the CSR provides an essential tool to ensure efficient coordination of restoration efforts while protecting natural populations. A resource such as the CSR provides readily available summary information on a sample's origin and therefore an estimate of the relative diversity of coral stocks across organizations without having to invest in development of new systems. The CSR is designed to be integrated into existing management systems, either through manual uploads or a direct application programming interface. The CSR removes the burden of sharing collection and stock information from the management agency by placing it in a publicly accessible forum, which also facilitates better coordination. Resource management agencies will be able to mine all registered samples in an area to gain an understanding of overlap between groups, across species, or perhaps identify areas that should be scouted for unknown wild colonies. The CSR provides a tool for managing corals as a population rather than group-owned stocks. As new territories and nations expand their coral restoration efforts, we encourage the inclusion of CSR registration as part of their permitting pipeline.
A potential benefit of the CSR that spans all stakeholder groups concerns recordkeeping and analysis in the case of natural resource damage. Legal remedies for such damage caused by an anthropogenic event require strict chain-of-custody for samples that can be greatly facilitated by the CSR, while simultaneously providing a one-stop record of existing pre-disaster samples that might be accessed for reference. As an example, the 2010 Deepwater Horizon oil spill prompted emergency coral sampling in advance of an anticipated arrival of oil contamination via the Gulf Stream. The de novo invention of a chain-of-custody system for these samples was a significant component of this effort. With a functional CSR in place, this recordkeeping effort (and to some extent the sampling effort as well) may have been much reduced.
Finally, we encourage funders and publishers to encourage coral-related submissions to register their samples with the CSR and track accession numbers, as is common practice for permits and samples. In this way, information can be transparent and publicly accessible, enabling investments and outcomes to have the broadest possible impact.
The needs for standardized terms and metrics across coral restoration as well as the data management structures to collect, store, and share key data are well defined by coral restoration management agencies (Coral Restoration Consortium, 2021;Florida Fish and Wildlife Conservation Commission, 2021). The Coral Sample Registry helps to fill these needs by correlating information related to coral samples across multiple sources. By itself it can serve as an invaluable tool to further collaboration, document and track the origin of restoration materials, provide insight into the sampling of wild populations, and facilitate knowledge sharing among groups. Collectively, this helps close an important knowledge gap, increasing confidence that a complete picture of sampling efforts is available, lowering the risk of missed information. But the CSR's efficacy will depend on its adoption as a repository for sample collections, and the subsequent association of sample accession numbers in derivative efforts. Toward the greater good of considering collected samples as part of a large meta-population, we encourage restoration practitioners, researchers, and management agencies to adopt the CSR accession number standard, institutionalizing its inclusion where possible. Alone, the CSR represents the potential for greater insights; it will be up to the broader community to use the accession number as a key to unlock information across data repositories.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
AmM managed the project's development and produced the original draft of the manuscript, with significant contributions from BB, RD, LM, MM, JM, AlM, and RW. All authors contributed equally to the design of the project.

FUNDING
Funding in support of this technology was provided in part by the Wallace Research Foundation and the Paul M. Angell Family Foundation. The funding from both foundations was directed toward the database design and development.