Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications

This paper recommends best practice for the use of open nomenclature (ON) signs applicable to image-based faunal analyses. It is one of numerous initiatives to improve biodiversity data input to improve the reliability of biological datasets and their utility in informing policy and management. Image-based faunal analyses are increasingly common but have limitations in the level of taxonomic precision that can be achieved, which varies among groups and imaging methods. This is particularly critical for deep-sea studies owing to the difficulties in reaching confident species-level identifications of unknown taxa. ON signs indicate a standard level of identification and improve clarity, precision and comparability of biodiversity data. Here we provide examples of recommended usage of these terms for input to online databases and preparation of morphospecies catalogues. Because the processes of identification differ when working with physical specimens and with images of the taxa, we build upon previously provided recommendations for specific use with image-based identifications.


INTRODUCTION
Improved technology and approaches for surveying the marine environment have led to a rapid increase in the number of in situ images of both shallow water and deep-sea taxa, that are now being used for biodiversity studies (Durden et al., 2016). The appropriate identification of organisms in these images is critical to scientific, environmental management, and conservation assessments (Durden et al., 2016;Thomson et al., 2018). Increasingly, these image-based methods are being used to supplement or replace traditional approaches, in scientific research, in baseline surveys of an ecosystem, and in repeat monitoring programmes. Such assessments are used to support applications for the industrial extraction of marine resources, e.g., for oil and gas (Gates et al., 2017); seabed mining Durden et al., 2018;Simon-Lledó et al., 2020); and fisheries (Clarke et al., 2009;Murphy and Jenkins, 2010). This approach is also used in conservation assessment (Bean et al., 2017), and in the monitoring of Marine Protected Areas (MPAs; Benoist et al., 2019). In all cases, these efforts require the accurate and repeatable monitoring of biodiversity data (Huvenne et al., 2016).
One major challenge for the identification of organisms in images is a lack of knowledge of the local fauna. Image-based surveys are often used to study remote locations, such as the deep sea, where extensive knowledge of the fauna is often limited (e.g., Jones et al., 2014). In deep-sea regions globally, high proportions of species are new to science, with estimates varying from 35 to 95% (Poore et al., 2015). Consequently, this has led to the development of local or regional field guides (and catalogues) describing the fauna recorded on images or videos, including example images with textual descriptions of the characteristics that helped identify the organisms, frequently without access to corresponding physical specimens (Tilot, 2006;Gowlett-Holmes, 2008;Jones and Gates, 2010;Gervais et al., 2012;HURL, 2013;Jacobsen Stout et al., 2015;National Oceanic and Atmospheric Administration, 2020).
Such catalogues aim to improve consistency between operators and surveys, and provide a morphological taxonomy of the putative taxa encountered (Amon et al., 2017). However, these catalogues lack consistency, are difficult to combine, have missing details, are often only housed locally, and are rarely machine readable without significant additional effort. This greatly reduces the utility of the information from these surveys, and therefore limits their scientific value. There are coordinated international efforts to develop databases and reference guides for the identification of marine species from images within a region. For example, the Standardised Marine Taxon Reference Image Database (SMarTaR-ID) that takes the North Atlantic deep sea as a case study (Howell et al., 2019).
In situ images provide contextual advantages over ex situ photographs and preserved specimens, including information about natural habitat, colour in life, behaviour, interspecific associations, and other ecological data (particularly useful for relation of a species to habitat). Additional features from images, such as shadows, may also be useful in detection of taxa and for identification. In some cases, published works contain taxonomic identifications from physical specimens together with corresponding in situ photographs, though such works are comparatively rare (Rogacheva et al., 2013).
There are numerous inherent challenges to the identification of taxa from images without a corresponding specimen (Howell et al., 2014(Howell et al., , 2019Durden et al., 2016). For example, image orientation may mean that features normally used for identification with a physical specimen are not visible. Organisms may appear rotated, overturned, and may be retracted or in unusual positions, while sessile animals may have only a single plane of view (e.g., dorsal rather than lateral or ventral), so diagnostic features may be hidden. Particular difficulties arise when diagnostic features are internal or on the ventral or lateral view. Similarly, the quality of the image may impact the identification, and is determined by a number of factors including the environmental conditions (e.g., turbidity and backscatter), distance from the camera to the subject, illumination, camera and/or video settings and platform (e.g., towed-platform, autonomous underwater vehicles, remotely operated vehicles, and baited remote underwater video systems) (Durden et al., 2016). Improvements in platform stability, camera and/or lighting technology may result in different taxonomic resolution between datasets over time (Macreadie et al., 2018). Some image annotation software, commonly used for marine studies, such as the Video Annotation and Reference System (Schlining and Stout, 2006), SQUIDLE+ (Williams and Friedman, 2015) and BIIGLE (Langenkämper et al., 2017), provide tools for recording levels of certainty of identification, such as classifying identifications as 'certain, ' 'provisional, ' or 'unconfirmed' (Durden et al., 2016), although explanations of these terms and how and when to apply them is lacking.
Given these limitations, it is common practice in the interpretation of image data to identify taxa to a taxonomic level higher than species (e.g., genus, family or higher rank), with individual taxa referred to as 'morphospecies, ' 'morphotypes, ' or given operational taxonomic unit (OTU) reference codes, rather than conventional taxonomic names. These OTU reference codes are often used instead of binomial Linnaean names to refer to organisms in publications, databases, and morphospecies image catalogues (Althaus et al., 2015;Howell et al., 2019). Minelli (2019) refers to a 'Galaxy of non-Linnaean names, ' outlining the variety of non-Linnaean names currently in use, and the problems arising in biodiversity informatics as a result of the lack of standardisation of their usage. Conventional Linnaean names are governed by codes of nomenclature (the International Code of Zoological Nomenclature (International Commission on Zoological Nomenclature [ICZN], 1999); the International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code), (ICN, Turland et al., 2018); the International Code of Nomenclature of Prokaryotes (ICNP, Parker et al., 2019) and the International Committee on the Taxonomy of Viruses [ICTV, Lefkowitz et al., 2018)]. Minelli (2019) divided non-code compliant names into three groups: (1) open nomenclature; (2) temporary names for undescribed species; and (3) 'mixed lists' where formal Linnaean names are mixed with informal names [e.g., in databases such as the Barcode of Life Data System, BOLD, 2020 1 (Ratnasingham and Hebert, 2007); and GenBank, 2020 2 (Benson et al., 2008)]. Informal names are not currently governed by any code, resulting in a wide array of usage, and greatly complicating attempts to make comparisons between datasets. There are also several issues with the data available in genetic databases, a key one being the increasing number of sequences deposited without reference to formal scientific names, which has resulted in an explosion of 'dark taxa' (Page, 2016). This is a substantial problem that results in the generation of additional 'taxonomic entities' with limited scientific meaning.
Open Nomenclature signs, hereafter ON signs, are commonly used in taxonomic, ecological, and biodiversity studies, and are extensively used in the designation of morphospecies and OTUs. They provide a means to explain the uncertainty of an identification. For example, 'Brisinga sp.' where the addition of the ON sign 'sp.' after the genus name indicates that this entity is identified as a species within the genus Brisinga, but that the species is not known (which may be for a variety of reasons). Sigovini et al. (2016) provided a review of the history, a thorough discussion, and an updated list of recommended open nomenclature signs, as well as some preliminary suggestions for the standardisation of their use when a physical specimen is available.
Despite the very useful recommendations by Sigovini et al. (2016) in providing clear definitions of the ON signs in current usage, it has become apparent that the implementation of these terms in practice is not well known or understood by many who use them, particularly those working on image-based identifications. There remains a need for clearer recommendations for the use of ON signs for this purpose. This need for improved guidance in ON use is happening alongside moves toward improved biodiversity data standards to facilitate access to and comparability among datasets online through a variety of openly accessible global databases and is an essential step to ensure that biodiversity data generated from marine images can meet FAIR data principles (Findable, Accessible, Interoperable, and Reuseable; Wilkinson et al., 2016). There are many such efforts underway with guidance on the standardisation of names being discussed and implemented by a variety of global databases and working groups (Welter-Schultes, 2012;Vandepitte et al., 2015Vandepitte et al., , 2018Horton et al., 2017;TDWG, 2020 3 ).
There are several online data repositories holding primary biodiversity data, with differing applications, focussing separately on sequence data or species occurrences ( Table 1 in Rabone et al., 2019 provides a comprehensive list and links). These include GenBank and BOLD, Catalogue of Life (CoL 4 ; Roskov et al., 2020), the World Register of Marine Species (WoRMS 5 ; WoRMS Editorial Board, 2020), the Global Biodiversity Information Facility (GBIF 6 ; GBIF, 2020), and the Ocean Biodiversity Information System (OBIS 7 ; OBIS, 2020). One of the most commonly used standards for sharing information about biodiversity online is Darwin Core 8 (Wieczorek et al., 2012), which is maintained by the organisation Biodiversity Information Standards (TDWG). The DarwinCore Archives data package (DwC-A) is used for most datasets that are input to biodiversity databases, with most data providers using GBIF's Integrated Publishing Toolkit 9 .
In this article we primarily draw on our experiences in dealing with images from the deep-sea environment, where the problems are particularly acute. Nevertheless, the same issues arise throughout the marine, and indeed terrestrial, environment.
There are numerous reasons why an image-based identification may not be able to provide any given specimen with a full binomial Linnaean name, our recommendations for ON usage may, therefore, be applicable in numerous applications, from microscopic imagery of the plankton (Culverhouse et al., 2014) to baited stereo-video surveys of fish assemblages (Langlois et al., 2020) to aerial photographs of marine mammals (Schweder et al., 2010).
We present recommendations for the best practice of ON usage for taxon identification from images, including a discussion of usages of ON signs and those to avoid, and suggestions for integration with the Darwin Core. We present a flowchart for decision-making of ON signs to use, and provide clear examples for illustration from marine image datasets.

DIFFERENT USAGES OF ON SIGNS A Flowchart to Aid ON Sign Selection
In considering the means to standardise the application of ON signs to an identification, it is important to recognise that ON signs are used for a variety of applications (Minelli, 2019). There are currently three main types of ON sign usage employed in association with image-based identification; (1) nomenclature applied to an individual taxon in a single image to be used in a publication and/or for entry to a database in a standardised format (e.g., Darwin Core); (2) nomenclature applied to one or more specimen images in a catalogue or morphospecies guide, with explanations provided in the text (e.g., figure legends, titles of sections on each morphospecies) and; (3) nomenclature applied to a group of specimens for the purposes of data reporting or formation of a data matrix for statistical analysis. In the latter case, this could involve the amalgamation and refinement of the nomenclature applied to associated individual specimens and may include taxonomic roll-up to higher taxonomic ranks (more confident identifications) and/or morphotype complexes. The decision of which ON sign to use may vary depending on the final application. Sigovini et al. (2016) provided a flow chart to indicate which ON sign should be used for a particular level of certainty in identification. That flow chart can be applied at lower taxonomic ranks of identification (higher resolution/more detail), both when a specimen is available, and for an image-based identification. However, image-based identifications pose their own challenges and are more often made at higher taxonomic ranks (lower resolution/less detail) than identifications of physical specimens. In our assessments of a broad range of deep-sea images and taxa, attempts to employ ON signs to images (using Sigovini et al., 2016) frequently resulted in uncertainties in usage rather than improved clarification and consistency in the datasets. Consequently, we have developed a similar method that has been specifically adjusted to use for image-based identification. The simple flowchart we provide, Figure 1, has been trialled and refined by both taxonomic experts and variously experienced image annotators. Following these trials, it is important to note that we now recommend that the usage of certain ON signs should be limited to physical specimens only.
FIGURE 1 | Key steps to identification of taxa in images. The flowchart provides a means to determine the lowest rank of identification with certainty, and provides an example of the ON sign to use in each case. For each ON sign the identifier can check (Figures 2, 3, 5, 6) for examples formatted for output in Darwin Core terms. *There are various reasons for using stet., which will depend on the operator applying the ON sign. The reason for stopping should be given in the identificationRemarks field, e.g., no experience of ID of this taxon (indicating it may be possible for another identifier with more experience of the group to take the identification further). **incerta can be used at any level of the identification (phylum inc., class inc., order inc., fam. inc., gen. inc., sp. inc.). It indicates the identification is not absolutely certain and should be added after the rank of uncertainty (e.g., Brisinga gen. inc.; Brisinga costata sp. inc.) If uploading the occurrence data to GBIF/OBIS, the lowest taxonomic rank should be entered in the scientificName field and the ON sign in the identificationQualifier field. ***The use of 'sp. nov.' should be avoided for input to the field identificationQualifier as it is required by nomenclatural codes to explicitly indicate a new species, and may also result in non-unique identifiers. The use of a unique sequential, alphanumeric code for new taxa is recommended for entry to the identificationQualifier field (see text and examples in Figures 2, 3, 6) with further information about the new taxon held in the identificationRemarks field.

Open Nomenclature Signs in Darwin Core Format
Examples of the usage of each ON sign are provided here for input in the Darwin Core format for entry to online databases and for peer-reviewed publications. The Ocean Biodiversity Information System (OBIS) provides a manual for the entry of taxonomic and identification data to Darwin Core in the database fields 'scientificName' and 'identificationQualifier' as follows: ". . .'scientificName' (required term). . . The name should be at the lowest possible taxonomic rank, preferably at species level or lower, but higher ranks, such as genus, family, order, class etc. are also acceptable. . .The scientificName term should only contain the name and not identification qualifications (such as ?, confer or affinity), which should instead be supplied in the identificationQualifier term. . ." 10 .
Darwin Core format therefore allows for the incorporation of ON signs in the field identificationQualifier, and also for the inclusion of remarks about an identification in the identificationRemarks field 11 . Darwin Core also includes a taxonConceptID field, defined as "An identifier for the taxonomic concept to which the record refers -not for the nomenclatural details of a taxon", that can be used to form a namestring that combines scientificName and identificationQualifier. In the case of non-code compliant names, the use of a namestring 10 http://obis.org/manual/darwincore/ 11 dwc.tdwg.org/terms/ or coding that is a combination of the scientificName and the identificationQualifier similarly becomes the taxonConceptID and therefore we recommend this field for these entries. It is most important that these fields are used in a standardised way in biodiversity informatics, in publications, and in field guides or morphospecies catalogues.
We have assessed the number of unique values of these three Darwin Core terms. Currently there are 1,048,576 records from 499 datasets that use at least one of these three DwC terms in OBIS. There are 4315 unique values of identificationQualifier, 411 unique values of identificationRemarks, and 3623 unique values of taxonConceptID.
While we would expect numerous unique values to appear in the identificationRemarks field, it is clear that this field is not very well used. There are very large numbers of unique values in the other two fields suggesting that to date, there has been little attempt to standardise these data.

DISCUSSION OF ON SIGNS TO USE/AVOID FOR IMAGE-BASED IDENTIFICATION
We have provided in Figures 2, 3 some clear examples of the recommended usage of open nomenclature and how it should be applied in Darwin Core format. Importantly, the usage in different formats (biodiversity informatics, publications, field guides, and morphospecies catalogues) may drive the decision on which ON sign to use.

Species (Singular), sp.
Species rank is the basic, gold-standard, taxonomic level to which ecological studies generally aspire. However, in many imagebased identifications it is not possible to know, with certainty, the species-level identity of an organism. In such cases, the taxon is usually given the ON sign 'sp.' i.e., Genus sp. The use of this ON sign alone is discouraged, as it does not indicate the reason that the identification was not determined to the species level. As indicated by Sigovini et al. (2016), we recommend that the term be supplemented with an additional qualifier, either stet. (stetit) or indet. (indeterminabilis) [see sections "Stetit (stet)" and "Indeterminabilis (indet.)]. Where neither of these terms apply, for example in the case of confirmed, but undescribed new species, a unique taxon identifier code (e.g., Eurythenes sp. DISCOLL.PAP.JC165.674) should be used. Simple alphanumeric codes are commonly encountered in both publications and databases (i.e., Eurythenes sp. 1 or Eurythenes sp. A) and should be avoided. Such simple codes are unlikely to remain unique identifiers beyond the dataset in question. By providing a more complex coding system relating to a collection or sample number (for physical specimens), or expedition/dive number/time stamp (for image-based taxa) when combined with the higher taxon, the namestring used becomes a unique identifier for that OTU. Chapman (2005) referred to this issue as 'Domain Schizophrenia' and emphasised the importance of establishing a formula to produce 'unpublished names' that remain unique once databases begin to be combined. For example, the unique species code could take the form adopted by the Australian Botanical community: <Genus> sp. <colloquial name or description> (<voucher>): (see Chapman, 2005); or could be formulated as is currently being recommended in the OBIS manual for creating a unique code for the Darwin Core field occurrenceID: "There are no guidelines yet on designing the persistence of this ID, the level of uniqueness (from dataset to global) and the precise algorithm and format for generating the ID, but in the absence of a persistent globally unique identifier, one could be constructed by combining the institutionCode, the collectionCode and the catalogNumber (or autonumber in the absence of a catalogNumber)" (Figure 4).
We therefore recommend that for taxa confirmed as new to science, temporary names be constructed for input to the 'identificationQualifier' field, and where physical specimens are available, the same temporary name should be used on specimen labels, input to museum databases and genetic sequence databases etc. thus facilitating links between these important datasets. The name construction can be managed by using a combination of the institution, museum or collection code (e.g., as found on the Global Registry of Scientific Collections 12 ), and the sample number or museum accession number, or a combination of the expedition/dive number/time stamp (for image-based taxa).

Species (Plural) spp.
The ON sign 'spp.' is used to indicate the presence of more than one species of the same genus, whose identification was not achieved (Sigovini et al., 2016). The usage of 'spp.' applied to image-based identifications depends much on the planned output (see section "Different Usages of ON Signs"). In single image identifications for upload to online databases in Darwin Core, usage of 'spp.' is discouraged as clearly a single specimen cannot be identified to multiple species. In these cases, we recommend that, sp. indet. [see section "Indeterminabilis (indet.)] is used.
For use in morphospecies catalogues or in analyses, Durden et al. (2016) advocate for the use of spp. to indicate a species complex, noting that "of forty rockfish species (Sebastes spp.), five are visually very similar unless an extreme close-up view of the gill cover and erect dorsal fin are obtained. All five species can be listed as separate terms, along with an additional term 'Sebastes complex, ' for use when species-level identification is not appropriate, but where species-level identification can also contribute to 'Sebastes complex' quantification." However, for reporting or analysis purposes, taxa may need to be merged where consistent identification or discrimination of those taxa was not possible across all the images annotated, or to enable comparison with other datasets. For example, two morphotypes can be identified as belonging to the same taxonomic group (e.g., genus or family) based on visible characteristics, but the identifier is unable to distinguish these two morphotypes consistently in the entire set of images, because the distinguishing features are not always visible (e.g., owing to variation in image quality and altitude of the camera). In such cases it is reasonable to merge the taxa to the next higher taxonomic rank (also called taxonomic roll-up) and to label this new merged group, which obviously contains more than one taxon, Family spp. or Genus spp. An example of this usage is provided in Figure 5.  Similarly, in the case of an image catalogue, where a series of photographed specimens of two or more taxa that cannot be discriminated confidently are provided, the use of spp. is reasonable. In these cases, the series of photographed specimens is provided with an identification qualifier, such as Colus spp., to indicate that there are two (or more) known species in the region and these taxa cannot be separated with confidence, and is usually accompanied by some notes indicating the difficulties of image-based identification, along with further information about the known species in the region (Jones and Gates, 2010;Fourt et al., 2017;Stefanoudis et al., 2018). Sigovini et al. (2016) indicate the use of fam. sp. and fam. gen. sp. (which can also be abbreviated to e.g., Zoarcidae sp. and Nematoda sp.), which has the same ON sign meaning as 'sp.' alone, and indicates that the taxon has not been identified beyond that higher taxonomic rank. For greater consistency of datasets, since Genus sp. without an explanatory ON sign is discouraged, so are Zoarcidae gen. sp. or Nematoda fam. gen. sp. as there is no indication as to why the identification stopped at that level, and therefore, should not be input to the identificationQualifier field. This may be particularly important for image-based identification, as confident taxonomic identification is often limited to higher taxonomic ranks than are possible if we have a specimen at hand, better image resolution, or different planes of view. In such cases, we recommend that stet., inc., or indet. are incorporated into the identificationQualifier [see sections "Stetit (stet)," "Indeterminabilis (indet.), and "Incerta (inc.)"].

Stetit (stet.)
This ON sign means it stood/stays or remained here, indicating the identification stopped at this taxonomic rank. Stetit can be employed for a variety of reasons, which require clarification. It may be used in the sense of Sigovini et al. (2016), indicating that it is a choice to go no further, i.e., "I called this taxon 'Ostracoda stet.' because I did not attempt to identify the ostracods any further; I simply noted they were ostracods and stopped there." Alternatively, the identifier may have been unable to take the identification further, so the identification stayed at that rank, i.e., "I called this taxon 'Ostracoda stet.' because although I made every attempt to identify the ostracods to a lower taxonomic rank, I did not have the expertise/time/identification resources available, so Ostracoda was the lowest rank to which I could identify the taxon." In many cases, image annotators will be providing identifications for images across a range of phyla, and by using 'stet.' in a dataset, image-based identifications with this ON sign can then be easily collated, and sent to a taxonomic expert who may be able to provide a more precise identification to a lower taxonomic rank. It is therefore important that the reason for stopping is recorded, this can be via the identificationRemarks field, or in the text of a morphospecies catalogue, e.g., 'no experience in identifying this taxon' (indicating it is possible to take the identification further). In cases where the identifier wishes to indicate that the same (or another) image annotator or indeed an expert taxonomist would be unable to identify the same image-based taxon further, then the ON sign should be 'indet.' [see section "Indeterminabilis (indet.)"].

Indeterminabilis (indet.)
We follow the recommendation of Sigovini et al. (2016) that the ON sign indet. is taken to mean that the taxon is indeterminable beyond a certain taxonomic level. For the cases considered by Sigovini et al. (2016), this inability to identify a specimen further was considered to result from to the deterioration or lack of diagnostic characters, particularly in the case of damaged material or partial specimens. This is also equally relevant to image-based identifications, where diagnostic characters are often not visible or resolvable in the image, which could be owing to the resolution of the image, or the orientation of the taxon in the field of view. This ON sign can be applied at any taxonomic rank.
Incerta (inc.) Sigovini et al. (2016) recommended the ON sign 'inc.' to indicate 'uncertain identification' and to replace the use of the question mark symbol '?' which is considered as a variable character (wildcard) by most computing software. In imagebased identifications, this ON sign can be used at all taxonomic levels (e.g., Aristidae fam. inc.), while it is less likely to be used at higher taxonomic ranks when a physical specimen is available. Incerta differs from indeterminabilis [see section "Indeterminabilis (indet.)"] which is used where the identifier cannot identify further as the characters are not visible/present, whereas incerta is to be used where characters are visible, but the identification remains uncertain. Even if the taxon in an image is clearly identifiable, absolute certainty may not always be possible. Incerta can be used at any level of the identification and since it indicates the identification is not absolutely certain, it should be added after the rank of uncertainty (e.g., Chimaera gen. inc.; Hydrolagus trolli sp. inc.). The choice of which taxonomic rank to enter into the scientificName field will depend on the level of certainty of the identifier. We have provided 2 examples of the possible different entries that may be needed for an 'image annotator' versus a 'taxonomic expert' in Figure 6. Such distinctions will allow these tentative identifications to be easily collated for later identification by an expert, as for the 'stet.' example.

Species affinis (sp. aff.)
Imaged specimens are commonly identifiable to an entity close to, or with an affinity to, a known taxon (family, genus or species), but with clear distinction from it. This ON sign is often used in the taxonomic literature to signify that the identifier believes the taxon to be a new species, such as 'Eurythenes sp. nov. aff. sigmiferus'. We recommend that the term 'sp. nov.' is not used in the identificationQualifier field, which, as indicated in Sigovini et al. (2016), is a nomenclatural act. Although use of this ON sign in online datasets and catalogues does not make a name available with respect to the current codes of nomenclature (International Commission on Zoological Nomenclature [ICZN], 1999;ICN, Turland et al., 2018;ICNP, Parker et al., 2019), we recommend that the terms 'sp. nov.' and 'sp. nov. aff.' be avoided to prevent confusion. Information about the potential new-to-science taxon may be included in the identificationRemarks field. The terms may be used in published papers and morphospecies catalogues, where a taxon is believed to be new but is not described, as this commonly occurs with image-based identifications, particularly in deep-sea studies (Pawson et al., 2015). The examples in Pawson et al. (2015) are written as Paroriza ? new species, which could cause problems if input in this format to global biodiversity databases. Therefore, we recommend that confirmed new taxa should be amended to the format Paroriza sp.
[unique123] aff. pallens, including a description of why the taxon differs from P. pallens (Koehler, 1895) and is considered new to science, in the identificationRemarks field or in the text, as in Pawson et al. (2015). Duffy et al. (2016) referred to 'Paracallisoma sp. 6' and there is a discussion in the text indicating that it is likely to be a new taxon. The simple alphanumeric code used by the authors is not recommended [see section "Species (Singular), sp."]; a better option would have been 'Paracallisoma sp. DISCOLL.56761'. Using a unique namestring for probable new taxa when referring to them in publications and databases, allows a consistent means of referring to them and allows them to be clearly referenced in later papers, including in the synonymy of a new species description, and to provide links between datasets.

Confer (cf.)
As indicated in Sigovini et al. (2016), cf. is from the Latin confer, meaning to be compared with. The use of this ON indicates that the identifier cannot be certain of the identity of the species (or higher taxonomic rank) until a more detailed comparison can be made, for example with some type or reference material, or to consult a taxonomic expert. This ON sign is very widely used in image-based identifications to indicate that the identifier believes the species to be similar, or most likely equates to a certain species (or higher taxonomic rank) but they cannot say for certain without further study of more images or a physical specimen. Since the terms cf. and aff. are often confused and their current usage is inconsistent, the term cf. is discouraged in application to image-based identifications, and in particular for use in online datasets. Where diagnostic features are unclear, the lowest level of identification should be moved up a rank to, e.g., Calamocrinus sp. indet. (or Calamocrinus diomedae sp. inc.) instead of Calamocrinus cf. diomedae, and the information regarding the likely identity of the species should be included in the identificationRemarks field and/or in the corresponding text section of an image catalogue.
There are cases where confer is used for image-based identifications in the true sense. In Simon-Lledó et al. (2019) the image of Bathystylodactylus is referred to as Bathystylodactylus cf. echinus, and this is because a specimen was available and even after study of the specimen, and consultation with a taxonomic expert, the authors determination remained that comparison with more material was indeed necessary (pers. comm. Sammy De Grave, Oxford University Natural History Museum).

DISCUSSION
When identifying taxa from images, there are numerous challenges (Durden et al., 2016), but the decisions on how to indicate a standard level of identification, enabling comparisons between datasets have not been explored in detail to date. There will always be a degree of uncertainty with taxon identifications solely from images (i.e., without a corresponding specimen), and while this cannot be eliminated from imagebased biodiversity datasets, the provision of a robust decision mechanism to standardise and clarify the uncertainty will improve the subsequent use and comparability of datasets.
In this article, we recommend the use of consistent open nomenclature, as commonly applied with physical specimens, and recently updated and clarified by Sigovini et al. (2016) to provide a robust set of standard terms for use in image-based identifications.
Today numerous published datasets and papers make use of ON signs. Yet, there is currently much confusion and little evidence of standardisation in the usage of these ON signs, or even explanations as to what is meant by them. The current lack of standardisation presents a clear risk for the future use of many datasets.
There are some good examples of consistent use of ON signs, or at least, clear explanations of the usage in particular cases. For example, Glover et al. (2016) in their paper on the Echinodermata from the Clarion-Clipperton Zone, indicate that for a species "similar to a morphologically well-defined species name where we lack comparable genetic data from type material or from the type locality, or when genetic data previously published in Genbank is incompatible with ours, we use the open nomenclature expression "cf."." In their dataset, the morphological identifications are provided with a clear coding, e.g., Asteroidea sp. (NHM_054) or Freyastera cf. benthophila, the meaning of which is clearly defined in the text of their paper.
The decision on which ON to use in image-based identifications can be difficult, particularly so when working with taxonomic experts, accustomed to making species-level determinations and to using ON signs for physical specimens. In cases where the expert taxonomic opinion is 'I cannot be certain, as I cannot see the necessary morphological characters, but it looks most like the species Xus yus.' then that taxon must be regarded as indeterminabilis, it cannot be determined further, and should be recorded as Xus indet. However, the use of Xus cf. yus is regularly seen being used for such cases with image-based identifications, and this ON usage should be discouraged.
As we have detailed in this article (see section "Discussion of ON Signs to Use/Avoid for Image-Based Identification"), appropriate and consistent use of open nomenclature and Darwin Core terms enables both: (a) secure biodiversity data consistent with the FAIR principles (Wilkinson et al., 2016) -Findable, Accessible, Interoperable, and Reuseable; and (b) the ability to capture and record additional information on potential identifications and putative new taxa without compromising the immediate reusability of initial standardised identifications.
In the case of the image-based identifications of deep-sea taxa from which we have developed our ideas, we recommend the uniform banking of such data with the Ocean Biodiversity Information System. It is clear from our brief analysis of the current usage of the relevant Darwin Core terms in OBIS, that there is already a critical need to provide guidance on the unambiguous and standardised entry of ON signs into the right data fields. Annotation software could incorporate these ON signs to facilitate outputs with suitable references to the certainty of identifications in images. Our recommendations are equally applicable in comparable shallow-water operations, and indeed in non-marine settings. We hope that this article will help both taxonomic experts and image analysts to make informed choices in applying ON signs in the future, and thus improve the quality, comparability, and longevity of biodiversity datasets.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ supplementary material.

AUTHOR CONTRIBUTIONS
TH and LM jointly conceived the ideas for manuscript. All authors defined the scope of manuscript. TH and LM wrote the initial outline. LM and WA undertook data analysis and contributed figures. All authors contributed to the writing and editing process.

FUNDING
TH, BB, DJ, NB, JD, and AG were funded by Climate Linked Atlantic Section Science (CLASS) programme (NE/R015953/1) supported by United Kingdom Natural Environment Research Council (NERC) National Capability funding to the National Oceanography Centre. JD and AG were supported by NERC grant NE/S009426/1 "Sustained autonomous environmental monitoring of offshore oil fields" and the EU Horizon 2020 Project "EMSO-Link" grant ID 731036. LV was supported by the Flemish Contribution to LifeWatch Belgium, funded through the Research Foundation Flanders (FWO). DJ received funding through the One Ocean Hub, a collaborative research for sustainable development programme funded by United Kingdom Research and Innovation through the United Kingdom Global Challenges Research Fund under NERC grant NE/S008950/1. SP was supported by the United Kingdom Natural Environmental Research Council (grant number NE/L002531/1), and a CASE studentship 'Collaborative Awards in Science and Engineering.' Workshop, September 2020, for their contribution to discussions on this topic. The authors are also grateful to the scientific team, taxonomic experts and volunteers working on the Seamounts Research Project at the Charles Darwin Foundation and members of the Deep-sea Research Project at the National Geographic Society's Exploration Technology Lab for their valuable contributions to discussions on the use of the suggested ON signs for image-based analysis in newly explored regions of the deep ocean. Images from the Galapagos and have been reproduced with permission and were obtained from the Ocean Exploration Trusts expedition NA064 under research permits PC-26-15 and PC-45-15 granted by the Galapagos National Park. This publication is contribution number (2382) of the Charles Darwin Foundation for the Galapagos Islands.