Current and Forthcoming Approaches for Benchmarking Genetic and Genomic Diversity

The current attrition of biodiversity extends beyond loss of species and unique populations to steady loss of a vast genomic diversity that remains largely undescribed. Yet the accelerating development of new techniques allows us to survey entire genomes ever faster and cheaper, to obtain robust samples from a diversity of sources including degraded DNA and residual DNA in the environment, and to address conservation efforts in new and innovative ways. Here we review recent studies that highlight the importance of carefully considering where to prioritize collection of genetic samples (e.g., organisms in rapidly changing landscapes or along edges of geographic ranges) and what samples to collect and archive (e.g., from individuals of little-known subspecies or populations, even of species not currently considered endangered). Those decisions will provide the sample infrastructure to detect the disappearance of certain genotypes or gene complexes, increases in inbreeding levels, and loss of genomic diversity as environmental conditions change. Obtaining samples from currently endangered, protected, and rare species can be particularly difficult, thus we also focus on studies that use new, non-invasive ways of obtaining genomic samples and analyzing them in these cases where other sampling options are highly constrained. Finally, biological collections archiving such samples face an inherent contradiction: their main goal is to preserve biological material in good shape so it can be used for scientific research for centuries to come, yet the technologies that can make use of such materials are advancing faster than collections can change their standardized practices. Thus, we also discuss current and potential new practices in biological collections that might bolster their usefulness for future biodiversity conservation research.


INTRODUCTION
Almost every form of human activity is directly or indirectly connected to the alteration or loss of natural habitats, leading experts to define this current era as the "Anthropocene" (Lewis and Maslin, 2015;Waters et al., 2016). Paleontological records show that we are currently undergoing a higher rate of species extinction than in any previous transition between geological eras (Waters et al., 2016). During this time of rapid biological change, museum specimens collected decades or even centuries ago can be used as baselines to document more recent, humanrelated changes in species numbers and their distributions, in phenotypes and in genetic variability (Billerman and Walsh, 2019). The utility of the specimens for these purposes is often dictated by the type of specimens that were collected, particularly before molecular techniques were developed or when those techniques were still under development. We now have the ability to study genetic changes at vast scales, as we can produce enormous amounts of data across the entire genome for large numbers of individuals and species.
"Biodiversity" is a blanket concept relevant across different levels of biological organization. The genetic/genomic level is increasingly relevant in this time of planetary change, as a healthy pool of genetic diversity helps populations evolve and adapt (Templeton, 1994). When current technologies are combined with comprehensive genetic sampling it is possible not only to survey genetic information in populations at the entire genome scale, but also to explore the genetic basis of different adaptive or non-adaptive phenotypes. Understanding phenotype-genotype relationships is one of the longest-standing questions in biology in general (Orgogozo et al., 2015), and is also highly relevant for conservation efforts (Allendorf, 2017). Our rapidly improving abilities to search for candidate genes with adaptive value will be advantageous for the conservation and management of species in changing environments. This is therefore a critical time to focus efforts and resources into creating and maintaining collections that allow us to benchmark current genetic and genomic diversity using current and future techniques, i.e., to establish genomic diversity "baselines." This will allow us to better understand what could be lost, and to predict what may be lost if we do not take conservation actions. Given that sample collection and preservation are costly, it is prudent to prioritize the collection and curation of certain samples over others. Here, we offer ideas that can guide sample archiving for genetic benchmarking of vertebrates. Any sample collection should, of course, be well-designed and follow statistical sampling protocols when relevant (Hayek and Buzas, 2010). Our list is not exhaustive, because questions of interest evolve over time, just as techniques do. The development of new techniques opens new frontiers of interesting questions, which may in turn reveal additional opportunities to leverage the benefits of genetic benchmarking (Lawson Handley, 2015;Allendorf, 2017).
Ideally, sample collection can serve dual purposes of answering questions of current interest while preserving samples as genetic benchmarks for future research. This duality of immediate and legacy benefits helps justify the substantial effort required to collect and curate samples. But many questions of implementation remain. Where should samples be collected? What populations or species should be targeted? We address these core issues in this review, and we suggest that effective genetic benchmarking could fall into at least eight broad topics of investigation: rare species; undescribed and/or cryptic species hotspots; naturally fragmented populations and isolated populations due to changing landscapes; species with continuous geographic ranges; habitat specialists vs. generalists, and rangerestricted vs. widespread species; hybrid zones; newly colonizing and reintroduced populations; and changing landscapes. The ideal sources of genetic material are samples associated with vouchered individuals (Rocha et al., 2014). However, lethal collection can be impractical in certain situations, as is the case when working with endangered species, or when the research question requires dense sampling of different individuals from the same population. Therefore, we also review recent research using alternative means of obtaining genetic material, such as historical museum specimens. But since these older specimens were typically not collected for the purpose of obtaining genetic material, extracting it in sufficient quantities and qualities can be challenging (McCormack et al., 2017). We contemplate the analogous possibility that the samples we are acquiring today may be suboptimal for technologies that are developed in the future, such as those focusing on analysis of proteomic data. We conclude by analyzing steps to maximize the use of samples collected today by anticipating new techniques that will likely be broadly deployed in the near future. Given our personal backgrounds and expertise, we focus on samples from birds and other vertebrates, yet many of the topics and ideas we discuss are relevant to other kind of organisms.

UNDERSTANDING THE EXTENT OF GENETIC AND GENOMIC DIVERSITY LOSS IN A RAPIDLY CHANGING WORLD
Here we discuss a series of categories and situations where sample collection should be prioritized, with the intention of providing a basic overview of possible justifications for purposeful collection of genetic benchmark samples. These categories are, of course, incomplete. The details of sampling designs will ultimately depend on the specific research questions being addressed, the types of organisms, and the complex considerations of logistics, permissions, and time and expense tradeoffs that pertain to any genomic benchmarking situation.

Rare and Declining Species
A fundamental need for effective conservation and management of rare and declining species is accurate estimates of both census and effective population sizes, past and present (Frankham et al., 2014;Waples, 2016). The effective population size has been defined as the size that an idealized population (i.e., one in which random mating, equal sex ratio, discrete and nonoverlapping generations, and random variation of reproductive success all occur) should have to be experiencing the same rate of genetic change as the natural population of interest (Caballero, 1994). In contrast, census population size is commonly noted to be the complete count of individuals in a population. The relationship between census and effective population sizes can be informative of demographic processes within the population (Pierson et al., 2018). They can both be genetically estimated , though some caution should be taken, considering several factors and conditions that may influence these estimates (see Box 2 in Hoban et al., 2020). Genetically derived estimates of census population sizes are increasing in number, as they are often cheaper and perhaps easier than traditional field-derived estimates, such as mark-recapture studies (Sabino-Marques et al., 2018;Bourgeois et al., 2019). Additionally, they could be perceived as being more ethical if they do not require direct interaction with individuals of endangered animals (Solberg et al., 2006;Arandjelovic and Vigilant, 2018). Estimates derived from hair or fecal samples, for example, may require 2-3 times more samples than the expected number of animals in the population to arrive at acceptably precise estimates; however, the most recent technological developments may make it possible to even obtain whole-genome level coverage from these "poor-quality" samples (Taylor et al., 2020). In addition, the same fecal samples may be analyzed with metabarcoding methods to discover information about diet and roles of animals in ecological networks (Barba et al., 2014;Barnes and Turner, 2016).
Genomic methods can inform us of recent changes in effective population size (Luikart et al., 2020) and also of historic changes, offering the potential to provide long-term perspectives on the effects of anthropogenic change on genomic diversity (Gattepaille et al., 2016;Oldeschulte et al., 2017;Bi et al., 2019). No vertebrate long-term monitoring programs date back more than a century, and most are only a few decades old. Therefore, these studies may be missing long-term cycles, as the declines of some species they detect may be occurring after abundances responded positively to widespread habitat alteration prior to the advent of such surveys (Hallman et al., 2020). The apparent declines, therefore, may not directly relate to immediate conservation problems, but fit within a longer-term pattern of abundance fluctuations. Therefore, there has been an increase in studies incorporating the perspective on longer-term changes in population sizes applied to conservation and management decision-making (Ardren and Kapuscinski, 2003;Brüniche-Olsen et al., 2018;Sato et al., 2020). Genetic techniques provide opportunities to understand historic context of temporal changes at much longer time scales. Past population bottlenecks can be detected as well as precipitous declines hundreds and thousands of generations ago (Ramakrishan et al., 2005;Oldeschulte et al., 2017), and currently, assesments of single nucleotide polymorphisms (SNPs) across the entire genome allow exploration of these questions even when very few samples are available (Brüniche-Olsen et al., 2018;Sato et al., 2020). The gray whale (Eschrichtius robustus), for example, already lost its Northern Atlantic populations (probably due to environmental change and/or by commercial whaling), and it is now only found in the Northern Pacific Ocean (Alter et al., 2015). The western gray whale population (near the coast of Asia) is estimated to be less than 200 individuals (Cooke, 2018). Brüniche-Olsen et al. (2018) used samples from two western and one eastern gray whales, to obtain whole-genome sequences at very deep coverages (between 27X and 30X) and were able to infer that these species show lower autosomal nucleotide diversity than most other marine mammals, but the decline of effective population size and the extent of inbreeding, is greater in the Western Pacific than in the Eastern Pacific populations. Interestingly, according to niche modeling, the authors also found future climate change could open new migratory routes that could allow gene flow and subsequent genetic recovery in the western population (Brüniche-Olsen et al., 2018).
Preservation of genetic samples for benchmarking purposes could allow retrospective analyses as techniques improve and allow more precise estimates of population size and the temporal scale over which such changes occurred (Bi et al., 2019). In addition, with rapidly improving techniques and ideas in the realm of de-extinction options, cryopreservation of gametes and other reproductive tissues, even for extant yet rare populations, adds potential insurance against complete population extinction (Saragusty et al., 2016;Corlett, 2017). Cryopreserved somatic tissues could serve this purpose as well: the San Diego Zoo recently announced (September 2020) the birth of a Przewalski's horse cloned from the tissue of a male preserved 40 years ago 1 .
Dry storage may offer an interesting alternative, considering some of the disadvantages of cryopreservation, such as complex and expensive logistics, and the need of constant supply of energy and maintenance by trained personnel (Saragusty and Loi, 2019). Both gametes and somatic cells can also be preserved through different drying techniques, and while they may not remain viable after rehydration, DNA is preserved almost intact (Saragusty and Loi, 2019). Collection and preservation of genetic samples from rare species of conservation concern should be a priority.

Hotspots of Undescribed or Cryptic Species
One of the basic criteria for defining priority geographic areas for protection is the number of species an area harbors, in particular the number of endemic species, as these cannot be found elsewhere if such areas are damaged or lost (Giam et al., 2012;Ennen et al., 2020). An increasing number of studies are also starting to move the focus from species richness to phylogenetic diversity, a proxy that may represent aspects of biodiversity beyond that captured by species richness (Gumbs et al., 2020). In either case, the operational units used in these studies are usually already described species and do not consider estimates of undiscovered and undescribed species (Vieites et al., 2009). One of the biggest challenges in this respect is the fact that there is a large proportion of unknown biodiversity that will undergo extinction before being scientifically described ("cryptoextinctions, " Giam et al., 2012). Undescribed species usually have very restricted ranges and are therefore particularly susceptible to extinction (Vieites et al., 2009).
Quantitative estimates of undescribed biodiversity are heterogeneous across taxa and geographic areas. In general, vertebrate taxonomy is much better known than that of any invertebrate taxa (Stork, 1993), and within vertebrates the estimated proportion of undescribed species is significantly higher for amphibians than for mammals (Giam et al., 2012). Proper species delimitation requires integrated assessment of genetics, phenotypic and behavioral data. However, such assessments at large scales to define priority areas can be impractical and time-intensive. Both promoting more geographically comprehensive sampling and the implementation of genetic tools to analyze such samples become critical for estimating the amount of undescribed biodiversity.
The greatest numbers of undescribed species are probably found in tropical forests of the Neotropical, Afrotropical and Indomalayan regions (Giam et al., 2012). The Amazonia is the largest lowland rainforest in the world, probably harboring vast numbers of undescribed anurans (Fouquet et al., 2007;Funk et al., 2012;Ferrão et al., 2016). Vacher et al. (2020) used a platform for high-throughput sequencing for small DNA fragments (Illumina MiSeq, Quail et al., 2012) to assemble a database of short mitochondrial sequences from approximately 4,500 samples of amphibians. They combined these newly generated data with approximately 6600 accessions from the NCBI online repository and showed that the number of species could be almost twice the currently recognized for the area (876 species vs. 440 listed by the IUCN Red List). While the selection of a species concept could impact these estimates, a strength of this study is that authors started working with OTUs (Operational Taxonomic Units, solely based on genetic clustering), and then proceeded to contrast their results with recently described, valid species finding high levels of coincidence, and supporting the idea that their number of estimated new species was accurate.
This study focused on the Eastern Guiana Shield of Amazonia, where the authors recovered three bioregions altogether and determined that up to 82% of the OTUs found in this area are endemic. Interestingly, the Eastern Guiana Shield has been considered as a unique bioregion based mostly on avian species (Naka, 2011). This highlights that, while birds are among the bestknown vertebrate groups in terms of taxonomy and distribution, they may not be a good proxy for other terrestrial groups given their much higher mobility.
Knowing the number of species and their abundances is an essential step in benchmarking our planet's biodiversity, but we lack this basic information for many of the most biodiverse areas of our planet. The study by Vacher et al. (2020) is just one example of how recently developed genetic and genomic techniques can help us tackle these problems, by detecting genetic variation across large spatial scales to reveal cryptic biodiversity.

Fragmented Populations Due to Natural and Anthropogenic Causes
While the description of new species is key for conservation efforts, there is general consensus that protecting the genetic diversity contained within species, in recognized subspecies or isolated populations, should also be a priority, even in widespread species still not considered as vulnerable or endangered (Thakur et al., 2018). New high-throughput sequencing techniques not only allow production of massive amounts of short DNA sequences from thousands of individuals, but they also make it possible to scan entire genomes to study more subtle patterns of genetic variation, such as those found in some fragmented populations.
The emperor penguin (Aptenodytes forsteri), for example, is considered "Near Threatened" by the IUCN because "is projected to undergo a moderately rapid population decrease as Antarctic sea ice begins to disappear within the next few decades owing to the effects of climate change." (BirdLife International, 2020). These birds form breeding colonies on sea ice at the majority of their known colony locations (Fretwell et al., 2012) and previous studies found conflicting results in terms of the population structure between the colonies, ranging from complete demographic isolation of breeding colonies (Barbraud and Weimerskirch, 2001) to species-wide panmixia (Cristofari et al., 2016). A better understanding of the current connectivity between these colonies will inform risk assessments and management plans, since these colonies are sensitive to fluctuations in the extent and duration of the sea ice (Trathan et al., 2011;Fretwell et al., 2014). A more recent study sampled eight colonies around Antarctica and used Restriction Site Associated (RAD) sequencing to obtain almost 4,600 genome-wide SNPs from 110 individuals (Younger et al., 2017). The colonies sampled were divided into at least four metapopulations, with the colonies in the Ross Sea being one of them. The world's largest breeding colonies of both emperor (Fretwell et al., 2012) and Adèlie (Lynch and LaRue, 2014) penguins are located in the Ross Sea, which is also the only region with a predicted stable or increasing population of emperor penguins (Jenouvrier et al., 2014). Genetic tools revealed that the assumption of all colonies being demographically connected was incorrect (Younger et al., 2017). Thus, an extensive sampling across fragmented populations combined with genome-wide sequencing techniques can also provide a benchmark for the degree to which apparently connected populations may be demographically isolated, influencing longterm population resilience.
Genetic change may occur especially quickly in landscapes where composition and configuration are being altered by humans (Athrey et al., 2012;Aleixo-Pais et al., 2019;Pelletier et al., 2019), leading to genetic structuration and loss of genetic diversity across populations (Amos et al., 2014;Schlaepfer et al., 2018). Therefore, sampling across landscapes that are changing due to anthropogenic causes should also receive particular attention in genetic benchmarking efforts. Effects of habitat isolation vary strongly among species, often most strongly affecting levels of connectivity and gene flow among spatially disjunct populations (Allendorf, 2017). The transformation of large portions of territories into agricultural, urban and industrial lands (and the development of traffic infrastructure to connect them), is one of the main causes of habitat loss, fragmentation and pollution (Gill and Williams, 1996;Rouget et al., 2003;Gallant et al., 2007;Rompré et al., 2008). Therefore, it is urgent to understand how they affect the genetic diversity of both endangered, declining and not yet endangered species (Bani et al., 2015;Lenhardt et al., 2017).
The pace at which genetic responses to recent anthropogenic isolation appear has been difficult to measure in the past. The availability of more sensitive assays being developed by advancing technology and the possibility of sequencing entire genomes may improve our abilities to detect small changes, including evidence of inbreeding, other small-population effects, and restricted dispersal across different forms of habitat barriers (Corlett, 2017;Kozakiewicz et al., 2019;Maigret et al., 2020). For example, based on a dataset of approximately 2000 SNPs for the copperhead snake (Agkistrodon contortrix), Maigret et al. (2020) were able to detect evidence for subtle genetic structuring closely following the path of a highway that experienced high traffic volumes between 1920 to 1970 in eastern Kentucky, United States, but has now lost most traffic to a newly constructed alternative route. Their results add evidence revealing subtle impacts of anthropogenic fragmentation of landscapes, but also highlight the importance of temporal factors in landscape genetics, showing that temporal lags may impact our ability to detect the detrimental effects of land use change. The ability to detect subtle genetic structure across populations can help implement conservation management plans earlier, and therefore improve chances of successful protection of genetic diversity (Ralls et al., 2018).
Our ability to detect effects of land use change on population connectivity also depends on spatial scale of analysis. Kozakiewicz et al. (2019) sampled 271 bobcats (Lynx rufus) obtained from five populations in southern California, between Los Angeles and San Diego. Based on more than 13,000 SNP loci, landscape genomic effects were most frequently detected at the study-wide spatial scale, as predicted. However, negative effects of urban land cover on connectivity were also revealed when analyzing each population separately, with these negative effects being particularly strong in one population where stream habitat had been lost (Kozakiewicz et al., 2019). This is particularly interesting because knowing which landscape features can mitigate reduced connectivity in urban areas, such as riparian corridors in this case, can make the case for better conservation planning when continued urbanization is unavoidable.

Transecting Geographic Ranges
An unresolved question regards the patterns of genetic diversity across species' geographical ranges, even when distribution is or seems continuous. The long-standing but still controversial central-marginal hypothesis (CMH) suggests that genetic diversity should decline as one moves from the middle of the range, where species tend to be most common, to along the periphery, where the species' distribution becomes more fragmented, presumably because habitat conditions become less suitable and population sizes decline (Eckert et al., 2008;Pironon et al., 2017). Evidence for the hypothesis has been mixed (Sinai et al., 2019;Ntuli et al., 2020). The definition of "marginal, " whether it be geographical, ecological or genetic, influences evaluations (Eckert et al., 2008;Pironon et al., 2017).
From the genetic perspective, the amount of data used may affect inference about diversity patterns across geographical space. In a metanalysis of almost 250 studies published between 1968 and 2014, the probability of detecting a center-marginal pattern was not related to the genetic methods used by the studies considered (Pironon et al., 2017). However, our abilities to produce genetic data have increased dramatically since 2014. The studies discussed in the previous section are only two of many examples of how larger datasets, both in terms of sampled individuals and SNPs scanned, can detect previously shallow but significant genetic differentiation, undetected with previously available methods (Chattopadhyay et al., 2016;Aguillon et al., 2018;Clucas et al., 2019;Nascimento et al., 2019).
We anticipate even greater sensitivity to small but important genetic differences as technology improves, which will certainly be helpful for understanding the bigger and more challenging question of which processes led to these observed patterns.
A recent study used approximately 30,000 SNPs to test the main predictions of the CMH in the ongoing invasion of the cane toad (Rhinella marina) in Australia (Trumbo et al., 2016). The authors defined populations in the northern and eastern Australian coasts as the "core" populations and then collected samples along 6 continuous transects into interior Australia, where arid habitats and cold temperatures currently limit their distributions. Their results were mixed, with only some transects revealing what was predicted by the CMH, and highlighted the importance of environmental and climatic factors on shaping the patterns of variation in genomic diversity within continuous population ranges (Trumbo et al., 2016). Lower genetic diversity in edge populations could be one of the reasons such populations cannot evolve traits that would allow them to expand their ranges.
Most studies have assumed greater population sizes near range centers and not explicitly linked genetic data with population size estimates. Indeed, genetic diversity could simply be greater where abundance is greater (Hague and Routman, 2016;Allendorf, 2017), but alternative hypotheses suggest peripheral populations, if they are spatially distinct from central populations and experience limited gene flow, may be more genetically distinct because selective pressures in marginal environmental conditions are intense and differ from pressures in the center of the range. Peripheral populations may possess abilities to respond to changes and therefore may be key to a species' abilities to respond to climate change and other stressors (Lavergne et al., 2010). An interesting case is that of the redbelly yellowtail fusilier (Caesio cuning), an Indo-Pacific reef fish with a bipartite life history, first as pelagic larvae and later settling on coral reefs as juveniles. Adults depend on reef structure for protection at night, and do not migrate. Altogether, this suggests that long distance dispersal in this species requires a strong oceanographic conduit. Using approximately 2,500 SNPs generated from RAD sequencing, Ackiss et al. (2018) found evidence of reduced genetic diversity in the peripheral populations of this species in relation to the Kuroshio Current, a powerful western-boundary current in the Pacific Ocean. The authors found that sites closest to the periphery exhibited increased within-population relatedness and decreased effective population size, and potential for local adaptation. Further studies analyzing both genetic variability and population effective sizes could help us better understand differences in the genomes of central and peripheral populations. Therefore, thoughtful selection of species to sample along transects from the center of current ranges to margins, could help future scientists to understand what aspects of genomes have changed through time and to identify which locus or loci may have been under the strongest selection and favored success or failure to adapt and persist (Macdonald et al., 2017). In addition, more complete sampling across carefully chosen suites of species could better inform current basic questions about patterns of genetic diversity, such as the central-marginal hypothesis. We already have extensive evidence of shifts in geographic ranges associated with climate change for many species (Shoo et al., 2006;Chen et al., 2011;Pecl et al., 2017) and also forecasting models that have generated predictions of how geographic ranges are expected to change (Lawler et al., 2009;Guisan et al., 2013). Such models could form the basis for selection of taxa for further genetic study, which in turn can better inform future models, as some inconsistencies arise between predictions and observations (Walsh E.S. et al., 2019). One possible cause of such inconsistencies is that traditional modeling does not account for the ability of some species to adapt to change, which recent models are trying to incorporate and improve predictive accuracy (Nadeau and Urban, 2019;Peterson et al., 2019).

Habitat Specialists Versus Generalists
Comparatively little is known about relationships between genetic diversity, niche breadth and adaptability of vertebrate populations to environmental change. Most studies to date have focused on plants (Sexton et al., 2017), though some studies in animals show similar trends: specialist species tend to have deeper and finer-scale phylogeographic structure and stronger demographic fluctuations when compared to closely related generalist species (Silva et al., 2017;Engelbrecht et al., 2019).
Extreme specialists offer interesting models to study the genetic basis of certain phenotypes, and to better understand how changing conditions can affect different species and their interactions. The saltmarsh (Ammospiza caudacutus) and Nelson's (A. nelsoni) sparrows are two recently diverged species (∼600,000 years; Rising and Avise, 1993) commonly considered marsh endemics. However, the saltmarsh sparrow is a narrow niche specialist, while the Nelson's sparrow can be found in a broader range of habitats (see Walsh J. et al., 2019 and references therein). Lower nesting success in tidal marshes have been reported for the Nelson's sparrow, suggesting adaptive differences between these species (Maxwell, 2018). Walsh J. et al. (2019) analyzed genome-wide divergence between these species and found several candidate genes to be linked to adaptation to tidal marsh environments, including genes linked to osmotic regulation, circadian rhythm, and plumage melanism.
We generally assume that habitat generalists should have advantages in dynamic environments, but what is the underlying genetic basis for niche breadth variation and ability to adapt to changing conditions? Genetic benchmarks establishing current levels of diversity, along with measurements of niche breadth generated from field observations and habitat analysis, would improve our understanding of the temporal plasticity in niche characteristics and how that plasticity associates with dynamics of the genome.

Range-Restricted Versus Widespread Species
The relationship between extent of geographic range and niche breadth is generally positive (Slatyer et al., 2013), resulting in some species having expansive geographic ranges, whereas others are restricted to small areas of geographical space. Given this relationship, one might predict greater resilience to environmental change in widespread species and higher levels of genetic diversity; while those restricted to disappearing habitats and already in low abundance require immediate attention.
Identification and analysis of relevant functional loci and how those vary across time and space could facilitate accurate assignment of populations to conservation-relevant risk categories. The willow flycatcher (Empidonax trailli) is an interesting case, because the entire species is widespread across North America, but one of its four subspecies, the southwestern willow flycatcher (E. t. extimus), is native to the Desert Southwest of the United States, and restricted to riparian woodlands along waterways (Sedgwick, 2020). These habitats probably provide a refuge against the extreme temperatures of these region (Chen et al., 1999); and with the loss of these habitats this subspecies has been undergoing a steady decline, with an estimate of no more than 500 breeding pairs in an assessment from 20 years ago (Sogge et al., 1997). Temperature increases due to climate change is expected to worsen the situation, which motivated Ruegg et al. (2018) to use genomic techniques to study local adaptation in the southwestern willow flycatcher to extreme temperatures and asses its vulnerability to future climate change. The authors assembled a reference genome for the species and then analyzed more than 100,000 SNPs from more than 150 individuals across 22 localities (Ruegg et al., 2018). By incorporating a series of climate variables to their analyses, they were able to identify a set of genes of potential importance for thermal regulation, and to assess the "genomic vulnerability" to predicted climate change of the different lineages within the willow flycatcher. As expected, the already endangered southwestern willow flycatcher will be the most vulnerable lineage to the anticipated increases in heat waves (Ruegg et al., 2018).
How dynamic are these relationships across time and across spatially variable environmental conditions? Are there underlying genomic differences across lineages that might reveal mechanisms allowing greater tolerance to environmental variability? Again, comprehensive sampling may be required to answer these questions, keeping in mind that current representation of organisms in museum and biological collections may be biased toward species with broader distributional ranges (Boakes et al., 2010;Vale and Jenkins, 2012).

Hybrid Zones
Hybrid zones, where the ranges of two lineages exchanging genes meet, inform us of the pace, pattern and process of speciation (Hewitt, 2001;Gompert et al., 2017). They may be relatively stable in location or displace (Buggs, 2007). The genomic dynamics of hybrid zones vary across lineages and certainly through time and space. Monitoring these movements generally requires genetic data, as phenotypes will rarely reflect many of the genomic dynamics very readily. What is more, certain areas of the genomes can be more resistant to gene flow than others (Wolf and Ellegren, 2017). Although current locations of many vertebrate hybrid zones are well-known, many are sparsely sampled, particularly where phenotypic signals are cryptic among poorly known taxa (Allendorf et al., 2001). Geographically structured samples collected to provide benchmark genetic data can help us quantify the temporal and spatial patterns of gene flow, introgression, and inference as to the ancestral origins of genotypes by establishing additional historical bases for comparisons (Carling et al., 2011). Methods for investigating hybrid zones and current research directions have been recently summarized (Gompert et al., 2017). In addition, as landscape characteristics change along hybrid zones, patterns of gene exchange may also change.
Temperature shifts, for example, can have significant effects on species distributions and the dynamics of hybrid zones (Taylor et al., 2014;Ryan et al., 2018). Particularly susceptible to temperature changes are ectotherms, such as North American box turtles (Terrapene sp.). Martin et al. (2020) assembled a dataset of samples from more than 350 individuals across two well-studied zones of hybridization within this genus: one in South Eastern United States, between the woodland (T. carolina carolina), Gulf coast (T. c. major), three-toed (T. carolina triunguis) and Florida (T. bauri) box turtles, and the other one in Midwestern United States, between one subspecies of the ornate box turtle, T. ornata ornata, and T. c. carolina (see Martin et al., 2020 and references therein). Based on these replicated instances of contact at the intra and interspecific levels, the authors were also able to study the contrasting effects of selection and migration on hybridization. Analyzing more than 10,000 unlinked reference-mapped loci, they found that while in the midwestern contact area hybrids are present in low numbers and restricted to F1 generations only, the southeastern contact area included many backcrosses and F2 individuals, providing evidence of higher levels of introgression between the taxa. Interestingly, they found a set of specific loci with steep genomic clines between taxa, strongly correlated with temperature variables, but not with any precipitation or wind-related variables (Martin et al., 2020). The authors interpreted this as evidence of thermal gradients having a strong effect on introgression patterns and predicted that future changes in temperature could significantly affect the integrity of species boundaries within this genus of turtles (Martin et al., 2020).
A modern offshoot of natural hybrid zones involves the potential intermixing of genes from wild versus captively raised and released animals (Kitada, 2018). This is particularly true for economically important fish species such as salmon (Einum and Fleming, 1997;Clifford et al., 1998;Glover et al., 2017). Genetic benchmark samples of less economically important populations may provide similar opportunities to understand potential introgression and gene flow between native and released populations, especially given the extensive movement of organisms out of their native range by humans (Vitousek et al., 1997;Costello and Solow, 2003).

Newly Colonizing Populations and Reintroductions
Changes in allelic diversity that allow some populations of vertebrates to survive and thrive in new environments can be explored if samples are collected relatively soon after colonization is detected. Most colonizations and reintroductions fail, whether they are natural or anthropogenic in origin (Blackburn and Duncan, 2001). Reasons are many, but data on the specific roles that functional locus or loci may play in enhancing probability of success are sparse. Genetic benchmarks of newly arriving populations may reveal drivers of success or failure, and help identify situations where recolonization of eradicated invasive species is less likely (Purcell and Stockwell, 2015).
Describing the genetic characteristics of organisms utilized in translocation or reintroduction programs, and then resampling the population several generations later could help identify important information about who established successfully and who failed. Such information could improve efficiency in choice of individuals for future conservation translocation projects (Barba et al., 2010). The alpine ibex (Capra ibex) is a species of European wild goat that recovered from less than 100 individuals to approximately 50,000 in a century (Grossen et al., 2018). After genotyping more than 100,000 SNPs from 170 individuals, Grossen et al. (2018) could detect the footprints of their reintroduction strategy. Despite this encouraging recovery in numbers of individuals, the authors found that all reintroduced populations had lower levels of genetic diversity than the source population, both individually and combined. This could be related to the reintroduction plan used with this species, which consisted of initial reintroductions from captive breeding followed by secondary reintroductions from established populations. This is a nice example of how genetic benchmark samples can serve an immediate purpose of ensuring a sufficiently diverse sample of individuals is being introduced, perhaps reducing chances of inbreeding issues developing, and can also inform us of patterns of success when comparing the initial benchmark samples with future samples.
Genetic assessment of individuals prior to their use in reintroduction programs is also necessary to avoid including those that show signs of hybridization with other species. A particular problem arises when domesticated species are not reproductively isolated from their wild relatives, as is the case of several ungulate species in Europe (Iacolina et al., 2019). Genetic benchmarks could help avoid introgression of artificially selected variants into wild populations. The European mouflon (Ovis aries musimon), the wild relative of the domestic sheep, became extinct from mainland Europe by the Neolithic, but remnants from the first wave of sheep domestication that brought them to the Mediterranean isles of Corsica and Sardinia established feral populations (Chessa et al., 2009). Now considered "historically autochthonous, " the species is protected by regional laws after almost becoming extinct due to intense hunting and erosion of its habitat (Somenzi et al., 2020). There has also been evidence of extensive hybridization with domestic sheep since Roman times, with confirmed adaptive introgression of loci related to immunity mechanisms from mouflon to sheep, but not the other way round (Barbato et al., 2017). Yet, as individuals are being relocated within the islands and to mainland Europe, it would be important for future management to know the ancestry of individuals. Somenzi et al. (2020) used a machine learning procedure to screen more than 50,000 SNPs from non-admixed mouflons and sheep form Sardinia, and from confirmed admixed individuals, generating panels of reduced numbers of SNPs which could be used as Ancestry Informative Markers (AIMs). These AIMs represented fast, low-cost tools to identify the ancestry of a given individual, therefore the study provided both a tool to contribute to the conservation of this species, and also a new methodology that can be applied to the conservation of other wildlife in risk of hybridization with domestic species.

Species Benefiting From Anthropogenic Novelty
All habitats created by humans are novel on evolutionary time scales. Our agricultural habitats may mimic some natural habitats in structure, but plant species composition is shifted dramatically. This undoubtedly changes food resource availability as well as distribution and abundance of reproductive niches. Furthermore, novel chemicals are encountered as they are applied to control pests. Likewise, urban and suburban habitats in the modern era are home to sets of species that probably rarely co-existed in the past, including pathogens that may challenge immune function in novel ways.
While many organisms experience population fragmentation and loss of genetic diversity due to urbanization (see before), others actually may benefit from "urban facilitation" depending on their life history strategy. Many invasive species become dependent on resources provided by humans and therefore thrive in cities (Hulme-Beaman et al., 2016;Johnson and Munshi-South, 2017). Urbanization thus may facilitate dispersion and expansion of invasive species, which in turn may aggravate the threats against native biodiversity. Such is the case of feral pigeons (Columba livia) in eastern United States, which showed higher-than-expected gene flow under an isolation by distance model within large cities (Boston, Providence, New York City, Philadelphia, Baltimore, and Washington, DC; Carlen and Munshi-South, 2021). This means that the development of large human settlements and their increasing connectivity are facilitating the expansion of an invasive species, and the same is probably true for many other "human commensals" (Johnson and Munshi-South, 2017).
Samples collected to establish genetic benchmarks in time provide opportunities to understand the evolution of plasticity in response to human modification of habitats. What genetic mechanisms allow some species to be "winners, " adjusting to and even thriving in human-altered landscapes, while other species decline and disappear?
We have proposed several broad subjects of study to be considered as priorities for future collection of genetic benchmark samples. We also recognize the importance and encourage the publication of Data Papers with appropriate and extensive metadata to alert future researchers to the existence of vertebrate genetic samples and facilitate their appropriate future use (Deck et al., 2017). Such tools and papers will be helpful for development of formal prioritization and assessment processes, similar to efforts to identify collection priorities aimed at preserving wild crop plant genetic diversity (Castañeda-Álvarez et al., 2016).

SURVEYING PAST AND CURRENT GENOMIC DIVERSITY FROM NON-INVASIVE AND HISTORICAL SAMPLES
Collection of samples with an associated voucher is scientifically the best the option by far because it maximizes the potential information obtainable from each specimen (Rocha et al., 2014;Webster, 2018). However, some species in urgent need of genetic analyses are already endangered and the only available sources of genetic material are non-invasive samples. The possibility to transition from "conservation genetics" to "conservation genomics" raised a potential issue, as some of the technologies collectively referred to as "next-generation sequencing" techniques required higher concentrations of DNA than are usually possible to obtain from non-invasive samples . But as technologies progressed and costs decreased, attempts to reduce sample size requirements have improved. For example, Russello et al. (2015) used non-invasive snares to collect hair samples from American pika Ochotona princeps; after extracting DNA they followed a nextRAD genotyping-by-sequencing method that allowed them to identify and genotype 3,803 high-confidence SNPs from 67 out of the 96 hair samples. The American pika is a small lagomorph with a discontinuous distribution along mountainous areas throughout western North America. Still considered of "least concern" by the IUCN Red List (Smith and Beever, 2016), it has become a focal species for studies of population dynamics and extinction risk due to climate change (Peacock and Smith, 1997;Stewart et al., 2015). Contrary to previous results across elevationally distributed sites in British Columbia, Canada (Henry et al., 2012), Russello et al. (2015) found that sites at the lower fringe of American pika distribution in North Cascades National Park exhibited significantly lower levels of gene diversity and heterozygote deficit likely due to inbreeding.
In many other cases, minimally invasive but non-lethal sampling is a possibility. As indicated earlier, collection of samples with an associated voucher is widely considered to be best practice, but we emphasize that not being able to associate a voucher with a genetic sample should not necessarily discourage collection of material for DNA extraction. Blood extraction from birds, for example, followed by release of the individual is a good option when it is impractical to euthanize individuals (Figures 1A,B). This is particularly the case for vertebrates other than mammals, whose red blood cells do possess a nucleus and are therefore a good DNA source. In such cases lacking traditional voucher specimens, the production of some type of "e-voucher" (i.e., electronical voucher, such as a photograph, Astrin et al., 2013) becomes a priority. Electronic vouchers such as photographs are often obtained in non-controlled environments where it may not be possible to follow the steps of high-quality protocols, such as using standard lighting. However, including low-cost size and color references is a simple way of increasing the scientific value of the e-voucher. Also, depending on the taxa, more than one photograph may be required, providing details of different parts of the body containing diagnostic characters. Therefore, members of collecting expeditions should familiarize themselves with such requirements to produce proper e-vouchers, following published protocols or designing and archiving their own.
Biological collections often welcome salvaged specimens (i.e., those found dead, Figure 1C) as they can produce viable samples for DNA extraction, and often some type of voucher can be associated to them. Salvaged specimens can be found by scientists during collection expeditions, but many are found by citizens or recovered by authorities from illegal hunting, pet trade, etc. Barone et al. (2020) assessed the relevance of avian tissues obtained from donated and confiscated materials for the National Ultrafrozen Tissue Collection of the Museo Argentino de Ciencias Naturales "Bernardino Rivadavia." They found that, out of a total of almost 10,300 avian tissues deposited at the collection, over one third come from donations and confiscated specimens, i.e., specimens found and donated by citizens or other institutions or confiscated by authorities from illegal trade. Interestingly, 18% of the species in the tissue collection are represented only by samples that come from donations and confiscated material (Barone et al., 2020).
Another alternative for assessment of genetic benchmarks when collection of new samples is limited are existing specimens in biological collections ( Figure 1D). Museums have been accumulating biological collections for over two centuries, but for the largest proportion of that time there were no means and no intention to preserve tissue for genetic analysis (as the majority of the specimens in the world's biological collections were obtained before the discovery of the role of DNA). Yet these collections represent the only resources to study the genomic diversity of extinct species, or of species that can no longer be collected for other reasons. Methods to extract DNA from museum specimens have been under development for decades, with the challenge that historical DNA is degraded by fragmentation. The previously available techniques were designed to target specific regions of the genome to accurately copy long (typically 300-1,500 bp) stretches of DNA and require steps of DNA amplification which are very susceptible to contamination (Hykin et al., 2015;McCormack et al., 2017). The development of high-throughput sequencing brought new hope, as these typically produce sequences of as few as 50-150 nucleotides per read, making it easier to recover genetic data from old specimens, especially those preserved as dry preparations (Yeates et al., 2016;McCormack et al., 2017;Ruane and Austin, 2017;Pierson et al., 2020).
Historical specimens can complement fresh tissues to assemble geographically comprehensive datasets, which are critical to detect genetic structure within a clade and inform conservation plans. The red-tailed black-cockatoo Calyptorhynchus banksii, is an Australian species with five currently recognized subspecies based on body and bill size and plumage color patterning (Ford, 1980). Despite being common in many locations, the rarity of some of its populations and subspecies combined with its wide geographic range makes the assemblage of a species-wide set of samples challenging. Ewart et al. (2019) used a restriction site-associated DNA marker approach (DArTseq TM , Diversity Arrays Technology, Australia) to obtain thousands of SNPs from 113 fresh tissue samples and 29 (out of 47 included) toepads, with a mean age of 44 years, ranging from 5 to 123 years. Using two different pipelines to process and filter the data, the proprietary DArTsoft14 and STACKS (Catchen et al., 2011), the authors obtained 6,389 SNPs (with 4.19% missing data and 2,745 SNPs with 11.6% missing data), respectively. Interestingly, the authors also combined fresh and historical samples in different datasets to evaluate how the inclusion or not of the old samples affected their results. They found that, while most data sets showed the same patterns of differentiation among the five populations based on Fst values, both the bioinformatic pipeline and the samples included in SNP calling had a large effect on Fst values obtained, which lead to considerable variation in estimates of the scale of population differentiation (see Table 3 and Supplementary Tables 3, 4 in Ewart et al., 2019).
Historical DNA can also be extremely useful to evaluate changes in genetic diversity over time in highly endangered taxa. van der Valk et al. (2019) were able to infer genomic changes in the last century in the two subspecies of eastern gorillas, Grauer's (Gorilla beringei graueri) and mountain gorillas (G. b. beringei). The authors first performed a low-depth sequencing with historical DNA extracted from teeth and dried softtissue samples of 59 gorilla specimens and followed a series of steps that ended in seven Grauer's and four mountain gorilla samples collected between 1910 and 1962 with adequate coverage (3.1-10.8 X) to assess genomic changes. The Grauer's gorilla has a historically higher genetic diversity than that of the mountain gorillas, which the authors attribute to a period of population growth and expansion between 5000 and 10000 years ago. However, in the short time period spanned by this study (about 100 years, corresponding to 4-5 gorilla generations), the Grauer's gorillas showed evidence of a significant reduction of genetic diversity as well as an increase in inbreeding and genetic load (van der Valk et al., 2019). Those results may be related to reduction of 80% of its population size down to less than 4,000 individuals in the last 20 years. The much smaller population of mountain gorillas, in contrast, has experienced little genomic change in the studied period. While they have also experienced demographic changes, their population size has remained small and more stable, decreasing from less than 1,000 individuals to approximately 250 between the 1950s and the 1980s, and then recovering to ∼450 in 2013. On the one hand, this study demonstrates the negative genomic consequences that severe population declines during the last century can have, even in a species with long generation times. On the other, it suggests that conservation efforts unable to prevent population declines but slow them instead can still be useful to alleviate the negative genomic impacts of population declines.
Recovering genetic data from old specimens preserved as dry preparations has become routine (Payne and Sorenson, 2010;Lim and Braun, 2016;McCormack et al., 2016;Tsai et al., 2020). But many extinct, endangered and secretive amphibians and non-avian reptiles have been preserved mostly as formalin-fixed and fluid-stored specimens, from which obtaining genetic data remains challenging or impossible (Simmons, 2014;Hykin et al., 2015;Pierson et al., 2020). The DNA contained in formalinfixed specimens is highly degraded by fragmentation, base modification and cross-linkage within the DNA or between DNA and proteins. High-throughput sequencing technologies that required short DNA fragments, combined with bioinformatic tools that allow detection and filtering of low-quality sequences, may provide the opportunity to obtain genomic information from fluid-preserved specimens. Snakes are among the poorest studied clades within vertebrates for reasons inherent to their biology (their habits make them difficult to find and collect) and also related to their threatened status that limits collecting opportunities (Ruane and Austin, 2017). For many species the only potential source of genetic material are old specimens that were formalin fixed immediately after collection. Recently, Ruane and Austin (2017) presented a modified DNA extraction protocol which, combined with high-throughput sequencing, allowed them to recover DNA from 10 formalin-fixed and fluid preserved snakes for which there are little or no modern genetic materials available in public collections. Including specimens collected more than 100 years ago, the authors were able to sequence ultraconserved elements (2318 loci), which they combined with data from modern samples to build a phylogeny that included some enigmatic and poorly known species for the first time, such as the Günther's Mountain snake Xylophis stenorhynchus (endemic to the Western Ghats, India) and the Bougainville coral snake Parapistocalamus hedigeri (restricted to Bougainville Island in the North Solomon Islands group, Papua New Guinea). Both species have very restricted ranges and are categorized as "Deficient Data" by the IUCN Red List (Hamilton et al., 2013;Srinivasulu et al., 2013).

MOVING FORWARD: COLLECTING AND PRESERVING SAMPLES TODAY PLANNING FOR THE FUTURE
Most protocols for collecting and preserving samples are developed and adapted according to the needs of the technologies available at the moment of collection, yet their objective is to make the material useful to the generations to come. As we see scientists working hard to develop new tools and protocols to obtain DNA from material that was collected even before the DNA molecule was described, an important question arises: is there a way to reverse the story and collect and preserve biological material in a way that anticipates the technologies or applications of the future?
Documenting genetic diversity does not stop at finding variant sites in the genome, as that is only one of the dimensions of variation at the molecular level. In an interesting essay about the role of museums in the Anthropocene, Campagna and Campagna (2012) reflect that museums can only preserve "anatomical" structures and not "functions": the plant can be preserved, but not its photosynthetic process. While strictly this is still true, we are already preserving the RNA (transcriptome) and proteins (proteome) which are results of the expression of the coding parts of the genome. We can get closer to preserving biological functions by establishing and preserving cell cultures (Wong et al., 2012;Yohe et al., 2019). Yet tissue collection nowadays is not routinely done in a way compatible with RNA and protein sequencing and analysis, mostly due to methodological difficulties and increased costs of preserving tissues in such manner (Supplementary Material). Corlett (2017) provides a comprehensive list of conservation problems with current or potential genomic solutions, and some of the most interesting ones include sequencing RNA to make better informed decisions when selecting populations for reintroduction, optimizing ex situ conservation efforts and assessing acclimation potential (see Table 1 in Corlett, 2017).
The Bat1K consortium is a pioneer in this respect, with the recent publication of a detailed methodological paper describing recommended best practices to collect tissues in manners compatible with all three "-omics" analyses (genomics, transcriptomic and proteomic) and even cell culture (Yohe et al., 2019). As the authors stated, the main motivation to develop this detailed protocol was "to maximize the amount of potential molecular and morphological data for each bat and suggest optimal ways to preserve tissues so they retain their value as new methods develop in the future." Many bat species are endangered, and bats live longer and produce offspring at much lower rates than other similar-sized mammals such as rodents or shrews. This makes bat populations slow to recover and particularly susceptible to specimen collection. The specimens and tissues that can be obtained nowadays are limited and for some species may be last to ever be collected. Ensuring we maximize information from each specimen should be a priority.

CONCLUSION/FINAL REMARKS
Genetic or genomic samples can be used to establish benchmarks in time of organismal, evolutionary, and even population processes that may augment and surpass the value of "traditional" museum specimens. While biological collections have been sampling tissues for genetic analyses for decades, we consider it a priority that future collecting expeditions incorporate as one important objective the acquisition of samples that contribute to create these "baselines" of genomic diversity. Though not exhaustive, the series of criteria that we proposed here can help defining sampling priorities from now onward. Thoughtful collection of samples with respect to collection locations and populations of particular biological interest will not only serve this purpose but will certainly enhance the value of these samples over time.
We recommend future collecting efforts consider not only the criteria we discussed here in relation to what and where to collect but also how. Careful planning of which tissues will be extracted and how they will be preserved (immediately and long term) can help anticipating the inevitable improvement in biotechnology and analytical techniques and minimize the types of analyses for which samples will become "obsolete." In addition to the technical aspects of collecting and preserving a sample, its value is strongly attached to the information surrounding its acquisition. Therefore, collection and curation should adhere to best practices for linking samples with detailed metadata (Eymann et al., 2016). Lastly, while the original samples are irreplaceable and therefore worth effort and resources to be properly preserved, we also consider of great importance the long-term preservation and sharing of the knowledge derived from such samples (for example, by depositing sequences obtained in repositories such as GenBank) to develop complete and comprehensive benchmarks of the world's genetic and genomic diversity.

ETHICS STATEMENT
Written informed consent was obtained from the individuals appearing in Figure 1 for the publication of any potentially identifiable images included in this article.

AUTHOR CONTRIBUTIONS
NG and WR conceived the idea, wrote the manuscript, and approved it for publication. Both authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We would like to thank Irby Lovette and the Lovette Lab members for their useful comments and suggestions during the preparation of this manuscript. We would also like to thank two reviewers for their valuable comments and suggestions that significantly improved this manuscript. NG was supported by the Imogene Postdoctoral Teaching Fellowship, Cornell Lab of Ornithology. WR was supported by the College of Agricultural Sciences and the Bob and Phyllis Mace Professorship.