Benchmarking Taxonomic and Genetic Diversity After the Fact: Lessons Learned From the Catastrophic 2019–2020 Australian Bushfires

Environmental catastrophes are increasing in frequency and severity under climate change, and they substantially impact biodiversity. Recovery actions after catastrophes depend on prior benchmarking of biodiversity and that in turn minimally requires critical assessment of taxonomy and species-level diversity. Long-term recovery of species also requires an understanding of within-species diversity. Australia’s 2019–2020 bushfires were unprecedented in their extent and severity and impacted large portions of habitats that are not adapted to fire. Assessments of the fires’ impacts on vertebrates identified 114 species that were a high priority for management. In response, we compiled explicit information on taxonomic diversity and genetic diversity within fire-impacted vertebrates to provide to government agencies undertaking rapid conservation assessments. Here we discuss what we learned from our effort to benchmark pre-fire taxonomic and genetic diversity after the event. We identified a significant number of candidate species (genetic units that may be undescribed species), particularly in frogs and mammals. Reptiles and mammals also had high levels of intraspecific genetic structure relevant to conservation management. The first challenge was making published genetic data fit for purpose because original publications often focussed on a different question and did not provide raw sequence read data. Gaining access to analytical files and compiling appropriate individual metadata was also time-consuming. For many species, significant unpublished data was held by researchers. Identifying which data existed was challenging. For both published and unpublished data, substantial sampling gaps prevented areas of a species’ distribution being assigned to a conservation unit. Summarising sampling gaps across species revealed that many areas were poorly sampled across taxonomic groups. To resolve these issues and prepare responses to future catastrophes, we recommend that researchers embrace open data principles including providing detailed metadata. Governments need to invest in a skilled taxonomic workforce to document and describe biodiversity before an event and to assess its impacts afterward. Natural history collections should also target increasing their DNA collections based on sampling gaps and revise their collection strategies to increasingly take population-scale DNA samples in order to document within-species genetic diversity.

Environmental catastrophes are increasing in frequency and severity under climate change, and they substantially impact biodiversity. Recovery actions after catastrophes depend on prior benchmarking of biodiversity and that in turn minimally requires critical assessment of taxonomy and species-level diversity. Long-term recovery of species also requires an understanding of within-species diversity. Australia's 2019-2020 bushfires were unprecedented in their extent and severity and impacted large portions of habitats that are not adapted to fire. Assessments of the fires' impacts on vertebrates identified 114 species that were a high priority for management. In response, we compiled explicit information on taxonomic diversity and genetic diversity within fire-impacted vertebrates to provide to government agencies undertaking rapid conservation assessments. Here we discuss what we learned from our effort to benchmark pre-fire taxonomic and genetic diversity after the event. We identified a significant number of candidate species (genetic units that may be undescribed species), particularly in frogs and mammals. Reptiles and mammals also had high levels of intraspecific genetic structure relevant to conservation management. The first challenge was making published genetic data fit for purpose because original publications often focussed on a different question and did not provide raw sequence read data. Gaining access to analytical files and compiling appropriate individual metadata was also time-consuming. For many species, significant unpublished data was held by researchers. Identifying which data existed was challenging. For both published and unpublished data, substantial sampling gaps prevented areas of a species' distribution being assigned to a conservation unit. Summarising sampling gaps across species revealed that many areas were poorly sampled across taxonomic groups. To resolve these issues and prepare responses to future catastrophes, we recommend that researchers embrace open data principles including providing detailed metadata. Governments need to invest in a skilled

INTRODUCTION
Environmental catastrophes are becoming more common and intense due to climatic changes, such as increases in the number of days of extreme fire weather and increases in intense rainfall events. They will magnify impacts on species already subject to other threatening processes such as habitat fragmentation and invasive species (Coumouand Rahmstorf, 2012;Harris et al., 2018). Catastrophic events often affect huge areas, and in some cases, almost all of a particular ecosystem or species' distribution (Lande, 1993). Actions to promote recovery from large-scale events require two particular forms of biodiversity information: what was impacted, and how well it can rebound. While the spatial scale of impact can often be estimated from distribution and trait data (Legge et al., 2020;Ward et al., 2020), recovery is more complex to forecast. Long-term recovery needs accurate taxonomic information, and should incorporate information on genetic diversity in order to ensure the long-term persistence of recovered species (reviewed in Pierson et al., 2016).
Identifying what was impacted can be challenging when the description of biodiversity is incomplete. The presence of taxonomically unrecognised species-level diversity when coupled with loss of geographic populations (e.g., Ceballos et al., 2020) can lead to cryptic extinction (Boessenkool et al., 2009;Travouillon et al., 2019;White et al., 2019). Unrecognised species diversity is more likely to occur in low vagility organisms distributed across topographically complex biomes that have undergone regular habitat expansion and contraction over glacial cycles, which enables allopatric speciation (e.g., Hewitt, 2000). In these circumstances, species might not differ morphologically (Singhal et al., 2018), especially if mate choice is based on nonmorphological traits such as mating calls or pheromones. Where an event encompasses a region and a set of taxa for which these criteria apply, careful consideration of whether taxonomic recognition of species is complete and robust is needed for impact assessments and to prevent cryptic extinction.
Similarly, within-species diversity is important in assessing impacts and recovery from large-scale events. Genetic composition is considered an essential biodiversity variable (EBV) 1 for the management of biodiversity, and the maintenance and enhancement of genetic diversity is a key goal in the maintenance of global biodiversity (Convention on Biological Diversity, 2020). In particular, genetic EBVs focus on the maintenance of genetic variation within species and between populations, and the reduction of inbreeding to protect the 1 https://geobon.org/ebvs/what-are-ebvs/ long-term genetic health of biodiversity. A key genetic indicator suggested for inclusion as an EBV is the number of evolutionarily viable populations, i.e., with an effective population size (N e ) above 500 (Hoban et al., 2020).
Assessing how genetic diversity across a species' range has been impacted is more complex than species-level spatial analyses distribution data alone can describe (Hanson et al., 2020). Species often comprise discrete, definable genetic units having direct relevance to conservation management (Coates et al., 2018). These units range from populations within a meta-population, where each population is considered a Management Unit, to Evolutionarily Significant Units, which represent sets of distinct meta-populations that rarely admix with others (Moritz, 1994). These genetic units are also distinct in characteristics important to long term persistence, including their genetic diversity (e.g., heterozygosity, allelic richness) and meta-population connectivity. Long term recovery of species needs to prioritise the preservation of distinct conservation units while ensuring the genetic health of each independent unit.
Ideally, comprehensive information on the population genetic structure of species prior to a catastrophic event would enable assessments of immediate impact. They would also be a benchmark for comparisons after the event. These data would then enable genetically guided restoration and translocation. However, these data do not exist for the vast majority of species on earth. Where these data do exist, they may not be publicly available, or publicly databased sequences may be poorly georeferenced (Pope et al., 2015;Miraldo et al., 2016). Here we discuss our attempt to develop genetic benchmarking following the large-scale 2019-2020 bushfires in order to aid the recovery of vertebrate species in Australia.

THE AUSTRALIAN 2019-2020 BUSHFIRES
The Australian continent is often simplistically considered a bushfire-prone landscape in which the fauna and flora are well adapted to periodic, patchy fires. However, the bushfires of 2019-2020 were unprecedented in their extent (Figure 1a; Boer et al., 2020;Filkov et al., 2020) and severity, and burned some areas where fire is not part of ecosystem renewal, including rainforests (Ward et al., 2020;Godfree et al., 2021). Some of these wet forests burned for the first time in recorded history. The most fire-affected state, New South Wales, reported that more than 5.4 million hectares (∼14 M acres) burned, including FIGURE 1 | The extent of the fires (a) is shown over a Google Earth satellite image. The fire extent (from the Commonwealth National Indicative Aggregated Fire Extent Dataset) is outlined in white, and dark green regions on the image represent pre-fire closed forests. Areas where conservation units could not be assigned across the 59 species assessed, due to a lack of genetic samples, are shown in panel (b) (from Catullo and Moritz, 2020). Colour indicates the number of species in a grid cell for which populations from that area could not be assigned to a conservation unit, with the fire extent shown in the polygons.
37% of the national park estate (State of NSW and Department of Planning, Industry and Environment, 2020). These fires significantly affected particular habitats, including more than 81% of the World Heritage listed Greater Blue Mountains, and 54% of the World Heritage listed Gondwana Rainforests. Burned regions include extensive forests along the Great Dividing Range of eastern Australia, which are highly differentiated from surrounding less mesic ecosystems (Byrne et al., 2011). Many of these wet forests were in decline prior to the fires due to a long history of habitat fragmentation and extensive drought (Lindenmayer et al., 2000;Bradshaw, 2012). As such they are home to many endemic and declining species. Conservative estimates suggest that over 1 billion mammals, birds, and reptiles were killed directly in the fires or in their aftermath, and that over 3 billion were impacted (van Eeden et al., 2020).
The forest habitats of the Great Dividing Range, and the associated coastal platform, form a series of highly structured biogeographical regions. Significant expansion and contraction of the forest habitat has been associated with Pleistocene glacial cycles (Byrne et al., 2011). A substantial number of studies identify high levels of inter-and intra-specific turnover at key biogeographic barriers along the range (reviewed in Chapple et al., 2011;Bryant and Krosch, 2016) and rainforest taxa show especially high local endemism (Rosauer et al., 2015). However, for many of the more latitudinally widespread species that are likely to be fire-impacted, spatial genetic studies have not been undertaken or have not yet been published. Therefore, it is difficult to accurately estimate the overall impact on species-level diversity, and genetic diversity within species.
In response to what has been widely considered a conservation emergency, the Commonwealth Department of Water and the Environment developed a draft framework to prioritise emergency action for all vertebrate species whose distributions were substantially bushfire-affected (Legge et al., 2020;Ward et al., 2020). This framework ranks species for conservation action based on the overlap of the species with fire, threat status prior to the fire, traits that influence during-and post-fire mortality, and the likelihood of species recovery. For example, mountain stream endemic frogs from the genus Philoria were ranked as a high priority due to a likely high fire impact (pre-fire conservation status of endangered, high level of fire overlap with the species' range, and potentially high mortality during and after fires) and low rate of recovery (long life spans, and low number of eggs per clutch). From this exercise, 114 species of vertebrate were rated as a high priority for urgent management intervention (Legge et al., 2020).
A key opportunity to advise on the recovery of bushfireaffected vertebrates arose as scientists within Australia were aware of taxonomic issues relevant to such species. These issues included "known unknowns" -taxonomic species known by experts to be composite in some way, either comprising multiple candidate species (i.e., one or more potential undescribed species within a currently described species) or major genetic subdivisions such as Management Units. Also, potentially over-split species or subspecies were accorded inappropriate attention. In addition, given the scale of the conservation effort being planned across the range of the fires, there is significant value in genetic health of species being incorporated in recovery plans, and in clearly defining conservation units for management and recovery teams. To this end, we organised experts across Australia to provide information from published and unpublished information to government agencies regarding: • Taxonomic uncertainty, such as scientific support for subspecies, • Undescribed species which needed inclusion in the formal assessment process, • Conservation units within species where sufficient genetic data exists for this purpose, and • Priority areas for further sample collection by species and region to better enable researchers to quantify the distribution of conservation units and species.

PRE-FIRE GENETIC BIODIVERSITY BENCHMARKING
Our primary goal was to provide individual assessments of the pre-fire taxonomy and spatial genetic diversity for each priority species, where genetic data exists. These assessments summarised the taxonomic status of species and subspecies, defined conservation units within each species, and reviewed available knowledge about genetic diversity within each conservation unit (Catullo and Moritz, 2020; now available at https://www.nespthreatenedspecies.edu.au/projects/geneticassessment-of-priority-taxa-and-management-priorities & https://www.nespthreatenedspecies.edu.au/publications-andtools/genetic-assessment-of-bushfire-impacted-vertebratespecies).
In the first step we worked with known taxon experts (see Acknowledgments section) to identify existing publications and unpublished data, and to identify additional researchers who may have relevant unpublished genetic data. Of the genetic data included in our assessments (Figure 2), 42% of species relied entirely on unpublished data held by participating researchers, and another 12% of species assessments relied on a combination of published and unpublished data. For all data sources, the evidence for multiple taxa (candidate species or ESUs) within described species was peer-reviewed at an expert workshop in April 2020 (Catullo and Moritz, 2020). Species were categorised as having sufficient data for initial assessment (N = 59), potential for multiple taxa but insufficient geographic sampling (N = 40), having no indication of strong spatial structure (N = 37), short-range endemics (N = 37), or insufficient data to form an opinion (N = 36). The relative proportion of unpublished data was highest for frogs and lowest for birds. Most of the datasets comprised mtDNA sequencing only (28%) or combined mtDNA and nuclear DNA markers (24%). High resolution nuclear DNA SNP screens were included in 28% of datasets, mostly frogs. Much of these data are included in ongoing assessments of taxonomic boundaries in morphologically cryptic species complexes; it can take many years to generate the necessary spatial sampling and complementary genetic and phenotypic data.
Published data when available often did not address questions specific to this project, i.e., they did not define conservation units and assess levels of genetic diversity. Accordingly, the benchmarking effort for this project required reinterpretation of existing data, and substantial one-on-one engagement with taxon experts. Researchers were unanimously willing to provide their unpublished data and be identified as experts in the individual assessments. Researchers acting as experts were also asked to identify the correct conservation units across each species based on a set of standardised definitions (see Catullo and Moritz, 2020), and to review and approve final individual species assessments.
This assessment process resulted in the delineation of a substantial number of conservation units, ranging from undescribed species though to Management Units (Table 1). Within our initial assessment of 59 taxonomic species the expert group identified 29 undescribed or candidate species among the fire-affected mammals, reptiles, and frogs. These assessments identified, proportionally, the highest number of Values in each column identify the change in the number of each type of conservation unit from the number of species assessed. Negative values identify where previously described taxonomic units were not supported by genetic data. Conservation units are defined as known but undescribed species, clades that may represent undescribed species, subspecies, evolutionarily significant units, and management units. From Catullo and Moritz (2020).
undescribed species in frogs, followed by mammals, then reptiles. Evolutionarily significant genetic structure below the species level (i.e., confirmed or candidate ESUs) was identified in a substantial proportion of mammals and reptiles, with lesser values for amphibians. Overall, more birds were identified as being overdescribed at subspecies level, the genetic differentiation of many bird subspecies being comparable to the genetic differentiation between management units in other taxonomic groups. There were significant biases between taxonomic groups, however. One bias was the number of species for which adequate spatial genetic data, published or unpublished, were available. The most genetic datasets were available for terrestrial mammals (N = 22) and the fewest for birds (N = 5) for which there are fewer tissue samples available (but often many skins suitable for DNA analysis).
Of the published studies used in our assessments all but a few had genetic data available online (Figure 2). The greatest proportion of non-downloadable data was seen in mammals and reptiles. These data were mostly available on Genbank 2 , and newer studies had utilised genomics publishing data on Dryad 3 or the Sequence Read Archive 4 . Where data were published, there were still substantial challenges in accessing the required genetic data. Ideally, this would include georeferenced individuals and manipulatable results files such as phylogenetic trees. In most cases, georeferenced data on individuals is available, but often in a form that requires manual extraction from publications, and analytical outputs such as phylogenetic tree files are not available online.
Another significant challenge to this genetic benchmarking exercise was the high proportion of unpublished data that informs assessments of both taxonomic and intraspecific genetic diversity (Figure 2). While taxonomists have been very willing to provide unpublished data for the assessment of conservation units in target taxa, the primary challenge has been discovering whether unpublished genetic data already exists for a priority species, and which researcher has it. A necessary consequence of including unpublished data is that conservation assessments for such species were published in confidential appendices only available to agencies directly involved in the conservation effort, not to the general public as would be preferred.
During the process of spatially defining conservation units, there were significant areas where boundaries of conservation units relative to fire-impacted areas could not be defined due to geographic gaps in sampling. These areas of uncertainty are particularly important to our understanding of the confidence we can have in the conservation value of a geographic region. Therefore, we defined geographical areas of uncertainty for each species. Areas with a substantial number of undefined conservation units should be a priority for future field collection. To enable these collections, we highlighted areas without DNA samples by spatially summarising the number of species with uncertain conservation unit assignment in each grid square (Figure 1b). Secondly, we also provided lists of species that need collecting (i.e., were uncertain in their conservation unit designation) by protected area 5 . Through this approach we are able to both identify priority areas for future collections, and also identify the priority species for collection in each area.

DISCUSSION
The exercise of attempting to benchmark taxonomic and genetic diversity highlighted a number of important challenges to the effective and robust use of genetic diversity indicators (Hoban et al., 2020). However, the scale of previously unrecognised diversity we identified across the target species (Table 1) demonstrates the need for benchmarking genetic diversity for conservation and threatened species management. Identifying existing but unpublished datasets that were vital to describing diversity within many species was a significant challenge. Repurposing existing genetic datasets is not always straightforward, due to a combination of heterogenous data types, variable completeness of spatial sampling, and incomplete access to the necessary data such as georeferenced locations (Pope et al., 2015;Miraldo et al., 2016). We also learned that the benchmarking exercise is worthwhile: despite the challenges associated with identifying and summarising the data, Commonwealth and state governments are now actively incorporating this genetic information into their ongoing conservation assessments. However, in order to accomplish this effort in a time-frame useful to the conservation efforts, our effort required multiple staff working virtually full time for almost six months. We believe our work, and improving processes around data availability and conservation assessment, will assist in conservation funds being targeted toward the most at risk species, regardless of their current taxonomic status.

Unpublished and Missing Data
Despite Australia's rank as the world's fifth most megadiverse country (for comparison, the United States is the 16th; OECD, 2019), the taxonomic workforce has been in decline. This decline is explicitly linked to the prevalence of unpublished data and undescribed species, even in a well-populated region and in the well-studied vertebrates as considered here. Within Australia, the taxonomic workforce declined by 10 percent over the 25 years leading to 2017, during which time the Australian population increased by 40 percent (Taxonomy Decadal Plan Working Group, 2018). This lack of investment in a skilled workforce of sufficient size, relative to the scale of biodiversity, presents a significant roadblock to benchmarking biodiversity prior to and following a catastrophe. The level of undocumented biodiversity is likely significantly higher in groups such as invertebrates, plants, and fungi, all of which face potential cryptic extinction during a large-scale event. Investment from both state and Commonwealth governments in expanding and supporting a permanent taxonomic workforce would improve the ability to benchmark existing biodiversity, publish existing data, and to assess impacts following catastrophic events.
Australian natural history collections have been fundamental to any benchmarking of the genetic diversity of fire-impacted vertebrates. However, significant sampling gaps and low numbers of samples impeded genetic benchmarking for many species. While genotyping from vouchered specimens is becoming increasingly possible (Paplinska et al., 2011), the additional technical challenges mean these data need to be available at the time they are required. Museum collections can improve the ability to benchmark genetic diversity especially in rarer species through different but nonetheless complimentary strategies of voucher acquisition and acquisition of samples for DNA collection. Ideally, museums primarily collect DNA samples from vouchered specimens. While this is clearly best practice for vouchering, there is significant benefit to benchmarking genetic diversity through the collection of non-lethal replicates in populations (see García and Robinson, 2021). This is particularly true for threatened species for which extensive vouchering is not advisable (and for which genetic data may be most useful). We suggest collections aim to sample at least 10 spatially spread sites from each conservation unit within a species, ideally with 10 or more non-related samples per site to allow for estimates of within population diversity. Targeted sampling at areas where poor sampling exists across many species (Figure 1b) can make the collection exercise more cost effective. The effort to document the genetic diversity within species would also be supported by researchers providing subsamples of tissues to museums as standard practice.

Data Reusability
Key to enabling future biodiversity benchmarking is the availability genetic data under FAIR principles (findability, accessibility, interoperability, and reuse; Wilkinson et al., 2016), with appropriate and searchable metadata. Incomplete metadata in particular consistently frustrate efforts to quickly and bioinformatically assess diversity across geographic scales (Pope et al., 2015;Miraldo et al., 2016). Projects such as the Genomic Observatories Metadatabase (GEOME; Riginos et al., 2020) provide tools to improve uploading of effective sample metadata into DNA sequence repositories and we encourage their use. Useful analytical outputs such as phylogenetic tree files were generally not available, but should be provided through open data providers such as Dryad or TreeBase (Boettiger and Lang, 2012).
Published or unpublished, a significant issue for recent research utilizing single nucleotide polymorphisms (SNPs), is the accessibility and reusability of data sets. As a work around for the public dissemination of data, SNP data sets are often provided through supplementary materials or other file hosting sites. Where this is the case, it is often the final set of SNPs that are provided, not access to the raw sequence read data that would enable its repurposing for conservation questions. An additional issue with providing just SNPs is that different calling/filtering parameters generate inconsistent estimates of genetic diversity parameters (Wright et al., 2019), so limiting reusability. In our case, existing datasets were often designed to test for admixture between two candidate species. If these data were to be used to assess genetic diversity within each species, each species would be inferred to have a marked deficiency of heterozygotes (i.e., Wahlund effect; De Meeûs, 2018), leading to downstream issues when estimating diversity parameters. For these data to be reusable, the ability to recall SNPs data from more homogeneous sets of individuals is required.

Improving Assessments of Listing Status
Most jurisdictions assess the conservation status of species against the IUCN Red List criteria (IUCN, 2020). This coarse approach risks cryptic extinction of major components of genetic diversity and evolutionary heritage within species. In Australia, the Commonwealth Environment Protection and Biodiversity Conservation Act 1999 is able to recognise "important populations." These are populations that are necessary for longterm survival and recovery of a species, and the designation is applied for reasons such as protecting key source populations, protecting populations that are necessary for maintaining genetic diversity, and protecting populations near the limit of the species' range that may contain unique adaptive diversity. Approaching assessments of conservation status using both the IUCN Red List criteria as well as under any regionally specific legislation can provide significant additional conservation benefits. In our initial assessment we assessed "important population" status for all ESUs or candidate species. For example, we recommended this designation for the source population of the endangered Broadheaded Snake (Hoplocephalus bungaroides) in heavily burnt Morton National Park, and to each ESU within the Platypus (Ornithorhynchus anatinus).
In summary, our recommendations to improve the ability of governments to create genetic benchmarking datasets that enable the recovery of species are: 1. Research scientists should embrace FAIR data principles (Wilkinson et al., 2016). In particular, this should include ensuring raw sequence data are available online in such a manner to enable their repurposing. These data should have accessible and integrated sample metadata including highly accurate georeferenced locality data. Publication of research should include providing analytical outputs such as phylogenetic tree files. 2. Analysis of conservation status should include assessments under the specific nation-based legislation that applies at and below the species level, in addition to species-level IUCN Red List assessments. 3. Governments should invest in a highly-skilled taxonomic workforce with the capability to describe biodiversity prior to the catastrophe, and to assist in monitoring and recovery following the event. 4. Museums and herbariums should work with ethics and scientific permitting agencies to revise collection missions to increase population-level DNA sampling as a key priority outcome, in order to document the genetic diversity of species through time.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
RC, RS, and LT compiled the data. RC and CM wrote the manuscript with editing by all other authors. All authors contributed to the article and approved the submitted version.

FUNDING
Funding for this project was provided by the National Environmental Science Programme Threatened Species Recovery Hub, Theme 8: Post bushfire recovery support, and the Centre for Biodiversity Analysis (https://biology.anu.edu.au/research/centres-units/centre-biodiv ersity-analysis).