Seven Recommendations to Make Your Invasive Alien Species Data More Useful

controlled vocabulary. Improved access to accurate, real-time and historical data will repay the long-term investment in data management infrastructure, by providing more accurate, timely and realistic assessments and analyses. If we improve core biodiversity data standards by developing their relevance to alien species, it will allow the automation of common activities regarding data processing in support of environmental policy. Furthermore, we call for considerable effort to maintain, update, standardize, archive, and aggregate datasets, to ensure proper valorization of alien species data and information before they become obsolete or lost.

Science-based strategies to tackle biological invasions depend on recent, accurate, well-documented, standardized and openly accessible information on alien species. Currently and historically, biodiversity data are scattered in numerous disconnected data silos that lack interoperability. The situation is no different for alien species data, and this obstructs efficient retrieval, combination, and use of these kinds of information for research and policy-making. Standardization and interoperability are particularly important as many alien species related research and policy activities require pooling data. We describe seven ways that data on alien species can be made more accessible and useful, based on the results of a European Cooperation in Science and Technology (COST) workshop: (1) Create data management plans; (2) Increase interoperability of information sources; (3) Document data through metadata; (4) Format data using existing standards; (5) Adopt controlled vocabularies; (6) Increase data availability; and (7) Ensure long-term data preservation. We identify four properties specific and integral to alien species data (species status, introduction pathway, degree of establishment, and impact mechanism) that are either missing from existing data standards or lack a recommended controlled vocabulary. Improved access to accurate, real-time and historical data will repay the long-term investment in data management infrastructure, by providing more accurate, timely and realistic assessments and analyses. If we improve core biodiversity data standards by developing their relevance to alien species, it will allow the automation of common activities regarding data processing in support of environmental policy. Furthermore, we call for considerable effort to maintain, update, standardize, archive, and aggregate datasets, to ensure proper valorization of alien species data and information before they become obsolete or lost.

INTRODUCTION
Sound decision-making to minimize the risk associated with the introduction of alien species requires accurate and up-to-date data and the knowledge derived from them. These data feed into a wide range of processes to tackle problematic invasive alien species and are needed to develop an appropriate, evidencebased response ( Table 1). Horizon scanning (the systematic examination of future potential threats and opportunities, leading to their prioritization), risk assessment, risk management, early detection and rapid response all depend on accurate and accessible data [1][2][3][4]. So, although alien species data are little different from data on other species, the demands we place on these data are considerable and specific.
Current invasive alien species policies depend on the availability and quality of data. For example, the EU regulation no. 1143/2014 on Invasive Alien Species [5], requires member states to report on the status of invasive alien species of Union concern and their progress in managing them, likewise similar regulations exist in other countries, such as the USA [6]. Responsible authorities need access to timely and validated data and they need to report this in a standardized way, so it can be collated nationally and internationally. Within the EU, the European Alien Species Information Network (EASIN) [7,8] has been developed to this end, including a mechanism for quality assurance, safeguarding and improvement [9].
Mitigating and preventing biological invasions present particular challenges with regard to the quality, relevance and scope of data sources and infrastructure [10]. The numerous origins of the data and broad taxonomic scope, combined with the global geographic extent and input from diverse disciplines make proper handling of alien species data difficult, but also necessary. With this perspective, we gathered database managers, data users, data generators and biodiversity informatics specialists to outline how alien species data can be made more useful, taking into account the peculiarities and applications of such data. This resulted in seven recommendations, which, if followed, would improve the use of alien species data for research, policy and management purposes. Some of these recommendations are not unique to alien species data, but their impact would be particularly significant in this discipline. Alien species checklists Horizon scanning (e.g., [2]) Selection of species for risk assessment (e.g., [46]) Analysis of pathways of introduction and spread (e.g., [43,61,62])

Evaluation of effectiveness of control actions
Cost-benefit analysis of control actions (e.g., [74,75] The workshop consisted of talks and participatory exercises on four main invasive alien species themes: risk assessment, horizon scanning, management and monitoring. For each of these themes, participants reflected on the data needs and requirements ( Table 1), the data sources they commonly use, and the existing data standards. Materials from the workshop have been deposited in an open repository [11]. Conclusions reported by breakout groups were refined and supplemented in facilitated plenary discussion. Particular attention was paid to the perspectives of both the data publishers and data users.
During the workshop a number of opportunities for facilitating proper use and valorization of alien species data was identified and these resulted in the recommendations presented below and summarized in Table 2.

CREATE DATA MANAGEMENT PLANS
A DMP describes how the information generated by a project will be handled both during and after it is generated. These plans define responsibilities; aim to avoid data loss and incompatibility by indicating how data will be preserved and formatted; stipulate what metadata are required to understand the data; and consider data sharing options, including licensing [12].
Such plans are a means to improve data management and are now commonly required by funding agencies. The US National Science Foundation has required them since 2010 [13] and in 2013 the European Commission launched a pilot on open research data requiring a DMP in the first 6 months of the project [14]. The DMP approach also encourages journals to change their policies toward the archiving of data, though it is taking time for the whole scientific community to embrace these changes [15,16]. Typical minimum sections of a DMP are: (i) What type of data and metadata are expected? (ii) Which standards are used for alien species data? (iii) How should data be shared? (iv) How should data be permanently preserved? Researchers new to writing a DMP should refer to their institutional and funding agency guidelines if any, and, with respect to invasive species data, recommendations for ecologists [6,17].
Strictly speaking, each recommended action could be implemented without the need to compile a DMP. However, preparing and agreeing upon a DMP ensures a holistic approach to data management and increases its openness and accountability, while also answering the needs from funding agencies and institutional data policies [12], so we recommend their use.

DOCUMENT THROUGH METADATA
Good metadata provide information on provenance, scope, methods, limitations, data formats and units to facilitate correct data use, as well as license and contact information. USGS' Data Management Web site 2 lists multiple tools and best practices for metadata creation. Several metadata standards for biodiversity data are available: such as Ecological Markup Language (EML [18]) adopted by GBIF [19]; the INSPIRE directive framework (Infrastructure for Spatial Information in Europe) 3 , which describes geospatial data and the Data Catalog Vocabulary (DCAT) 4 , to describe datasets. We have not identified any specific metadata standards for alien species data and recommend the use of the metadata standards above, for which tools and services are already available [20]. An example of a tool for metadata standardization is the desktop application Morpho 5 , which guides users through the creation of EML [21]. Morpho can interface with a MetaCat registry to provide a searchable catalog of ecological datasets. This technology is used by both the DataONE repository 6 and the European Biological Observations Network (EU BON) [22]. Creating metadata may seem secondary to primary data curation, but metadata are essential to ensure the data 2 https://www2.usgs.gov/datamanagement/describe/metadata.php 3 http://inspire.ec.europa.eu 4 https://www.w3.org/TR/vocab-dcat/ 5 https://knb.ecoinformatics.org/#tools/morpho 6 http://dataone.org Create and implement data management plans to define the alien species data life-cycle, good data quality and metadata, standardization, data sharing options, and long-term data preservation.
2 Describe alien species data through metadata, so users can understand their scope and limitations, and use metadata standards (EML, INSPIRE) to facilitate metadata exchange.
3 Improve interoperability and sustainability of existing and new alien species information sources by exposing the data they contain through standard exchange formats.

4
Format data using existing standards (Darwin Core, GISIN) and engage in their development through TDWG.

5
Adopt controlled vocabularies to further increase interoperability of data and engage with TDWG to make these compatible with existing standards. 6 Increase data availability by making alien species data openly accessible as soon as possible after collection.

7
Ensure long-term preservation of alien species data by archiving these in existing data repositories (GBIF, Zenodo).
Frontiers in Applied Mathematics and Statistics | www.frontiersin.org can be discovered and used in the long term [23]. In the context of alien species data, improved access to metadata could enhance the speed with which data are found and mobilized.

IMPROVED INTEROPERABILITY OF INFORMATION SOURCES
Information on alien species is scattered among a multitude of sources, including databases; peer-reviewed and gray literature; unpublished research projects and institutional datasets [8,24]. Important international sources of these data include Any new initiative to collate data needs to consider its role and define its niche within a complex environment of global, continental, national and regional data repositories [7,26].
Almost any effort to compile and harmonize data from these sources is impeded by differences in field names, definitions, and taxonomy, as well as access and license restrictions [3,27]. The use of common standards for all these aspects can improve the interoperability of these data sources: their data can be more efficiently exchanged, combined, compared, and presented. In addition, data processing should ideally be performed in a repeatable way, to increase the efficiency of activities such as horizon scanning and risk assessment. For invasion policies to be proactive, these activities should be repeated at regular intervals [2].
Online alien species catalogs and invasive alien species information systems are difficult to keep up-to-date [28,29], yet they provide a wide variety of valuable information. Funding for these initiatives has been sporadic at best [28] and is often timelimited [29]. Thus relevant information stored and managed within such initiatives become quickly out-dated, and efforts to keep them updated are often suddenly discontinued. This tends to spread errors to other systems that are populated with data from such sources, particularly if provenance is poorly tracked. As such, the current process restricts alien species data exchange, aggregation, interoperability and even rescue. Technological advances have boosted the number of initiatives [30], but also increased the data's volume and complexity [23,[31][32][33]. A holistic approach to complex biological questions requires more from data than a traditional reductionist approach, as demonstrated by the success of the Gene Ontology [34]. Yet this poses additional challenges of ensuring data quality, data curation, interoperability and future-proofing against obsolete technology and increasing data volumes [35]. Technological change promises many improvements in data collection, with systems such as smartphones equipped with built-in GPS, image capture, external sensors, and automated and expert validation [31]. Also, advances in species detection through environmental DNA, such as those of Dejean et al. [32], need support to be included within alien species initiatives.
We recommend that alien species databases work together to follow common standards and that these standards are further developed for emerging data streams.

FORMAT DATA USING EXISTING STANDARDS
Within the scope of a single dataset, data only need to be formatted consistently to be usable. However, to combine datasets for broad-scale analysis, a community-defined exchange format or standard is required to allow data interoperability. Among the qualities of a "good standard" are that it be readable (by both humans and machines), simple, learnable and efficient [36].
The alien species research community is not universally aware of biodiversity informatics standards, where they come from and how they can be extended. Standards for the exchange of biodiversity data, including alien species data, are developed, discussed and promoted by the Biodiversity Information Standards organization, TDWG [37]. This organization is the guardian of Darwin Core, the most widely adopted standard to exchange biodiversity information related to species [38]. By following these standards, data managers can avoid duplication of effort and mistakes. Furthermore, the organization can give advice and support for updating existing standards and proposing new ones. It is recommended that the invasive alien species community continue to engage in TDWG, both to adopt standards for common terms and to establish standards specific to invasion biology.

ADOPT CONTROLLED VOCABULARIES FOR FOUR ALIEN SPECIES PROPERTIES
In addition to a standard format to exchange data, specialist communities often also require further control on the values of terms to increase interoperability. This can be achieved by adopting controlled vocabularies. This not only means that data can be merged, but also contributes to the normative definition of a term.
Four alien species properties were identified that are either missing from Darwin Core or lacking a reference to a recommended controlled vocabulary. These are introduction pathway, degree of establishment, impact mechanism, and species status. For each of these, vocabularies exist outside Darwin Core, yet these currently exist as frameworks and require further work to be developed into standards.
For pathway terminology, the need for a consistent classification, hierarchy, and terminology has long been recognized [39][40][41]. Meanwhile, a standardized hierarchical pathway classification was adopted by parties to the Convention of Biological Diversity [42] and is being applied to existing databases [9,43].
A framework for the degree of establishment has been presented by Blackburn et al. [44]. This hierarchical classification provides a terminology for populations at different points in the invasion process (casual/introduced, alien, naturalized/established, invasive) and allows expression of the range of establishments from those organisms only kept in cultivation or captivity through to full naturalization and invasiveness.
For alien species impact, a classification of categories based on the magnitude of environmental impacts was developed by Blackburn et al. [45], and has been adopted by the IUCN in 2016. However, for impacts other than environmental, such as socioeconomic, plant health, human health and animal health, no comprehensive overview is available, but several protocols have been developed for risk assessment that can provide inspiration for classifications (see [46] for an overview).
Standards from the trade and agriculture sectors can be useful in describing species status, for example, the International Plant Protection Conventions International Standard for Phytosanitary Measures: specifically, IPSM 8 7 , Determining pest status in an area; and IPSM 6 8 , Guidelines for surveillance. We recommend these controlled vocabularies are expressed in a machine-readable format and are referenced from the appropriate terms in Darwin Core. This is in line with the recommendations of the GBIF Task Group on Data Fitness for use in Research into Invasive Alien Species [33].
Additionally, controlled vocabularies might prove helpful in the dissemination of information on species management [47]. Good examples are the Global Eradication and Response Database [48] and the Database of Island Invasive Species Eradications [49]. The documentation of management actions in the field and the storage of these data are key to performing cost-benefit analyses of management measures.

INCREASE DATA AVAILABILITY
Much has already been written about the methods and needs for open data publication [3,17,50]. Beyond the good intentions, Invasivesnet is a developing global association for open knowledge and open data on alien species [51]. This association will facilitate greater understanding, communication, and improved management of biological invasions globally, by developing a sustainable network of networks for effective knowledge exchange. The association fosters tool development and cyberinfrastructure for the collection, management and dissemination of data and information on alien species from a range of sources (e.g., research, citizen science). The key point is that data should be shared and standardized to ensure interoperability [52]. In the case of species observation data a straightforward solution is to publish through a repository such as GBIF or the Ocean Biogeographic Information System (OBIS), as it ensures adherence to a minimum of common standards.
There can be little doubt that data sharing using community standards and adequate metadata are of benefit to research and society in general [53]. Yet motivating good data management 7 https://www.ippc.int/en/publications/612/ 8 https://www.ippc.int/en/publications/615/ is not easy when practitioners are not rewarded by their institutions. However, this is changing [54,55], particular with the support of aspirational statements such as the Berlin 9 and Bouchout 10 declarations, which show the willingness of some institutions and individuals to change. Also, there are now policy initiatives in place, such as the EU INSPIRE directive 11 or the United States Administration's Open Data Policy 12,13 , to mandate harmonization of spatial data.

ENSURE LONG-TERM DATA PRESERVATION
Under ideal circumstances databases would have funding for maintenance and updating for as long as they are useful, however, this is unrealistic. Furthermore, the end of a project is the wrong time to consider the long-term persistence of data [29,56]. Data actively being curated are often best maintained close to their source, however, longevity can be built-in to procedures by periodically depositing data in an open repository, not just on a personal or university website. Hence, data are protected from catastrophic events, human attrition, and the slow degradation of obsolescent hardware, which is the fate of much data [57]. If a publication is based upon a specific dataset it is good practice to deposit that precise version in a repository.
Not all repositories are the same, for example the Dryad 14 and Zenodo 15 repositories are general-purpose repositories able to accept data in ad hoc formats, not necessarily formatted to community standards. They provide flexibility, however, repositories dedicated to one data type provide much greater opportunities for integration due to their enforcement of standards. Examples of such repositories are GBIF and GenBank [58]. Repositories also differ in their ability to embargo the release of data and in the licensing options. We recommend that considerable a priori thought goes into data preservation and the choice of repository.

CONCLUSION
Many alien species databases have emerged either before or without knowledge of existing standards for database management in biodiversity informatics. Furthermore, existing standards do not adequately cover all the needs of the research domain. Not all ecologists have strong information technology skills, nor are experts in technology-mediated collaboration, shared instrumentation or standardized data collection [59]. In the rapidly changing information technology landscape, ecologists and conservationists cannot be expected to keep up with developments in software and data standards. This should encourage data managers, wherever possible, to simplify the tools created for ecological practitioners. This becomes more pressing as new technologies are used to provide data on alien species.
Many data management issues are common to all biodiversity data, yet species native range, introduction pathway, degree of establishment and impact mechanism are specific to alien species. Additionally, the need for fast dissemination of information and data is typical to alien species, in particular early detection and rapid response programs. Proactive responses to biological invasions require repeatable workflows for horizon scanning and risk assessment [60]. Adoption of standards and controlled vocabularies for this information can boost the usefulness for alien species research, policy-making and policy evaluation. There is a need for the acceptance of common data standards that take into consideration the needs of both data collectors and diverse data users, from the science community to the end user.
Work is required with the research and education communities and the standards authorities to ensure that suggested standards are shepherded through acceptance and implementation and that these standards are introduced early within the education of young scientists and promoted among those in the biodiversity community, so that they are adopted widely. Improving core biodiversity standards for their content and usefulness for alien species data will allow the automation of common activities needed to tackle biological invasions. We call for considerable effort toward maintaining, updating, standardizing, and archiving or incorporating current data sets, to ensure proper valorization of alien species data and resulting information before they become obsolete or lost.

AUTHOR CONTRIBUTIONS
QG, ACC, JP, SV, and TA wrote the original briefing note, which outlined the idea of a workshop on biodiversity data interoperability for invasive species. SV, QG, TA, PD, AD were the local organizers of the Workshop and prepared the initial draft of the paper. HR is Chair of the COST Action and has supported and attended the workshop. AS participated in the workshop, contributed to the writing of the paper, and arranged for the initial peer review of the manuscript through the U.S. Geological Survey. All other authors contributed to the writing of the paper and attended the workshop.

FUNDING
This article is based upon work from COST Action TD1209 ALIEN Challenge, supported by COST (European Cooperation in Science and Technology) www.cost.eu. JP was partly supported by the long-term research development project no. RVO 67985939.