Edited by: Jonathan J. Fong, Lingnan University, China
Reviewed by: Andreanna J. Welch, Durham University, United Kingdom; Mariana Lyra, São Paulo State University, Campus Rio Claro, Brazil
This article was submitted to Phylogenetics, Phylogenomics, and Systematics, a section of the journal Frontiers in Ecology and Evolution
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Historical DNA obtained from voucher specimens housed in natural history museums worldwide have allowed the study of elusive, rare or even extinct species that in many cases are solely represented by museum holdings. This has resulted in the increase of taxonomic representation of many taxa, has led to the discovery of new species, and has yielded stunning novel insights into the evolutionary history of cryptic or even undescribed species.
Museomics is a booming field that leverages the potential of natural history museums as a source of DNA (ancient DNA – aDNA – naturally preserved, heavily degraded trace amounts with both low quality and quantity yields, and usually between thousands to a million years old; historical DNA – hDNA – fortuitously preserved in voucher specimens almost always collected during the last 200 years, highly degraded with both low quality and quantity yields; and modern DNA – mDNA – tissues stored frozen or in preservatives, usually of high DNA quality and quantity, but in some cases, they can be affected by the mode of preservation regardless of time) coupled with genomic methods and techniques (
This innovative tool has been applied to discover and delimit species (
In the current biodiversity crisis, the discovery and documentation of biodiversity on earth should be a priority (
The Puebla deer mouse,
In this study, we show how museomics has revolutionized phylogenetic studies, improving our understanding of the biodiversity of our planet. Importantly, we demonstrate that holotype specimen data is crucial for confirming the accurate identification of poorly studied species, especially, when it concerns rare, extinct or under-collected species such as
We obtained 12 samples (ca. 2 mm2 of frozen tissue –internal organ– or dry skin) from specimens deposited at the Smithsonian Institution’s National Museum of Natural History and the Museum of Texas Tech University (
We performed all laboratory work at the Center for Conservation Genomics (CCG), Smithsonian National Zoo and Conservation Biology Institute, Washington, DC. DNA was extracted from frozen-preserved internal organs (i.e., liver or muscle, hereafter modern samples), in the modern lab at the CCG, using a DNeasy Blood and Tissue Kit (Qiagen Inc., Valencia, CA, USA) following the manufacturer’s protocol. We conducted all pre-PCR steps for the historical samples in a laboratory specifically dedicated to processing of historical and ancient DNA at the CCG. We extracted DNA from historical samples (i.e., dry skin), using the silica column extraction protocol (
We prepared dual-indexed libraries using the Kapa HyperPrep kit (Roche Sequencing) with 1/2 reactions, following the manufacturer’s protocol. To library prep the holotype specimen, we used the SRSLY PicoPlus NGS library prep kit (Claret Bioscience, LLC), according to the manufacturer’s protocol. We performed dual indexing PCR with TruSeq-style indices (
We also reanalyzed UCE and mitogenomes published by
We processed raw data, provided by the sequencing core, following the PHYLUCE v1.6.7 pipeline (
We performed two independent phylogenetic analyses using: (1) a concatenated dataset including all of our samples (
First, we conducted a Maximum Likelihood (ML) analysis, for both datasets and all levels of matrix completeness, using RAxML 8.12 (
We performed a Bayesian Inference (BI) analysis, with all levels of matrix completeness and both datasets, using MrBayes 3.2.6 (
Maximum Likelihood and Bayesian Inference analyses were performed without partitions (as mentioned above) and with partitions only on the 95% matrix of both datasets to test if there was any difference due to partitioning and to account for heterogeneity in rates and patterns of molecular evolution within each UCE loci. First, the Sliding-Window Site Characteristics (SWSC) partitioning method based on sites entropies (
Finally, we used the dataset without the holotype of
We analyzed read quality of the FASTQ format files using FastQC v0.11.5 (
We aligned sequences with MAFFT 7.45 plug-in (
We conducted a BI analysis, on a partitioned dataset, using MrBayes 3.2.6 (
DNA damage patterns were evaluated for the historical samples with mapDamage2.0 (
We estimated molecular dates of divergence using Bayesian MCMC searches implemented in BEAST2 v2.6.6 (
We also estimated the divergence times on the complete mitogenomes dataset. First, we obtained the best model and partition scheme in PartitionFinder 2.1.1 (
We visualized all phylogenetic and dated trees from the UCE and mitogenomes datasets in FigTree 1.4.4
We successfully sequenced UCE’s (raw data is available in GenBank under BioProject
Trinity assemblies yielded an average of 24,543 contigs per sample (min = 2,056; max = 87,428) for historical samples and 197,503 contigs (min = 43,081; max = 450,450) for modern samples. We recovered 4,406 UCE loci in the incomplete matrix (
We tested topologies with different levels of missing data for: (a) complete dataset (
Maximum Likelihood and Bayesian Inference analyses for both datasets (
Ultraconserved elements (UCE) phylogenetic trees constructed using Bayesian Inference and Maximum Likelihood with and without partitions. Trees from all analyses yielded identical topologies. Nodal support is denoted with posterior probability/bootstrap values (numbers above the branches indicate results without partitions, those below with partitions).
The species tree analysis, with all levels of matrix completeness and the dataset without the holotype specimen, estimated the same topology from all matrices (
ASTRAL species tree estimation based on different levels of matrix completeness (65% −3,649 UCE loci−, 75% −3,361 UCE loci−, 85% −2,155 UCE loci−, and 95% −417 UCE loci−) and the
We recovered near-complete mitogenome sequences for all samples, including the holotype specimen of
Mitogenome phylogenies based on Bayesian Inference (BI) and Maximum Likelihood (ML). Nodal support is provided with posterior probability and bootstrap values, respectively. The pink block highlights the phylogenetic position of the
In addition, all of the species which included both a mitogenome generated in this study and one obtained from GenBank were very similar and clustered together in our phylogenetic analysis. This allowed us to corroborate the taxonomic identity of the samples by using voucher specimens deposited in scientific collections. Finally, the results of mapDamage2.0 analysis showed a weak signal of DNA damage typical of historical DNA (
For the UCE dataset, the analysis estimating the time to the most recent ancestor (TMRA) recovered that the divergence between
Divergence dated nuclear phylogeny based on 417 UCE loci (95% matrix,
For the mitogenome dataset with six partitions, we estimated the split between
Divergence dated whole mitochondrial genome phylogeny. Dates above the branches are provided in millions of years. Blue horizontal bars and numbers below the branches show the 95% confidence intervals. The pink block highlights the phylogenetic position of the
All of our ML, BI, and species tree analyses, with both mitochondrial and nuclear datasets, strongly supported that the Puebla deer mouse,
To better understand the phylogenetic position of
To date, no phylogenetic hypothesis has ever suggested that the genus
A recent study by
Even though the objectives of this study were not to further investigate the phylogenetic relationships within the genus
The relationship between
In general, our nuclear and mitochondrial phylogenetic trees largely mirror the mitogenome trees of
Our divergence time estimates (based on separate UCE and mitogenome datasets) resulted in similar dates (
We dated three late Miocene – Pliocene events: the divergence between
Similar divergence times have been found in other studies of
The complexity of elucidating the evolutionary history of
Information on environmental fluctuations and the existence of corridors at that time that favored movement across the landscape followed by post-glacial isolation strongly support the role of Pleistocene climate changes in the diversification process of many taxa (
The case of
Throughout this manuscript, we have continuously mentioned the value and importance of natural history museums and the specimens that are currently housed in their collections to conduct a wide range of cutting-edge research as well as continue with more traditional studies. However, we also need to highlight and advocate for the need to continue collecting specimens and to continue building the scientific heritage of the collections in the forthcoming years to keep a record of the historical biodiversity on the planet for the future generations of researchers and society in general. From a general perspective,
The raw data generated for this study can be found in the GenBank under BioProject:
Ethical review and approval were not required for the animal study because we exclusively used museum specimens deposited in scientific collections. All destructive sampling requests of the museum specimens used in this study were approved by the destructive sampling committee of those museums. We also used publicly available data on GenBank.
SC-R, CE, and JM secured funding and designed the study. SC-R and MH performed the specimen sampling. SC-R conducted the lab experiments, analyzed and archived the data, produced the figures, and wrote the manuscript with contributions from all co-authors. All authors read and approved the submitted version.
SC-R received a fellowship from Smithsonian-Mason School of Conservation and George Mason University. Research funding was provided by the Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian-Mason School of Conservation, and George Mason University. This article was funded in part by the George Mason University Libraries Open Access Publishing Fund.
We specially thank the specimen collectors, collection managers, curators and all museum staff of the Museum of Texas Tech University (TTU) and the Smithsonian Institution’s National Museum of Natural History (NMNH) that granted the destructive sampling of museum specimens and provided tissue sample loans.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: