Single Cell RNA Sequencing Reveals Deep Homology of Dental Cell Types Across Vertebrates

Like most mammals, humans replace their teeth once throughout their lives and have limited regenerative capabilities. In contrast, mice continually renew tissues lost due to gnawing through a well characterized population of stem cells on the labial surface of the incisor. Most non-mammalian vertebrates replace teeth throughout life; the cellular and molecular mechanisms of successional tooth replacement are largely unknown. Here we use single nuclei RNA sequencing (snRNA-seq) of replacement teeth and adjacent oral lamina in Lake Malawi cichlids, species with lifelong whole–tooth replacement, to make two main discoveries. First, despite hundreds of millions of years of evolution, we demonstrate conservation of cell type gene expression across vertebrate teeth (fish, mouse, human). Second, we used an approach that combines marker gene expression and developmental potential of dental cells to uncover the transcriptional signature of stem-like cells in regenerating teeth. Our work underscores the importance of a comparative framework in the study of vertebrate oral and regenerative biology.


INTRODUCTION
Teeth are ancient structures whose presence in the fossil record chronicles the half billion-year history of vertebrates. Despite deep origins and conservation of the unit tooth, dental patterns vary widely amongst vertebrate groups. Teeth form in single or multiple rows on oral jaws and throughout the pharynx in many non-mammalian groups; birds lack teeth altogether, while mammals possess a single row of differently shaped teeth on the jaw margin. During early development, teeth are formed by an interaction initiated by the ectoderm [or endoderm (1)(2)(3)] of the pharyngeal arches with neural crest-derived ectomesenchyme. In humans, primordial progenitor populations disappear soon after development is complete. Humans and many mammal groups are termed diphyodonts because they have two generations of teeth (deciduous and permanent). While, mice and related rodents are termed monophyodonts and possess just one set of teeth throughout life. Mice do not replace teeth, but they do regenerate dental tissues from wellcharacterized stem cell populations on the labial surface of the incisor enamel and in proximity to the neurovascular bundle in pulp mesenchyme (4,5). In contrast, Lake Malawi cichlids and most non-mammalian vertebrates are called polyphyodonts, because they possess a dentition capable of whole-tooth replacement throughout life (6). Successive generations of teeth in polyphyodonts, the replacement teeth of cichlids, arise from a persistent dental lamina, a band of tissue containing stem cells (7)(8)(9)(10) formed by invagination of the oral epithelium into the dental mesenchyme during odontogenesis. This dental lamina (also called successional lamina) has been documented to persist through adulthood in various non-mammalian vertebrates (11).
Given the long history of teeth in the fossil record as well as the diversity of dental patterning and regenerative capacities noted above, it is of keen interest to know which features of teeth are evolutionary conserved, and which features are labile on evolutionarily deep and shallow timescales (12). Notably, the molecular tool kit involved in tooth initiation is highly conserved across vertebrates from sharks to bony fishes to reptiles to mammals, regardless of tooth location in the pharynx or on the oral jaw (13)(14)(15)(16). This observation has been supported by spatial gene expression analysis in dental tissues. Despite this deep homology of tooth initiation, few comparative studies have addressed the cellular heterogeneity in vertebrate teeth (17). Recently, Krivanek and colleagues used single cell RNAsequencing to produce a glimpse into the homology of mouse and human dental cell types (18). Similar cell types were found in human and mouse teeth with greater divergence observed in specific subpopulations of molar pulp. Yet, to date, no broader cellular-level comparison across vertebrates has been achieved.
Despite species-or group-specific attributes of dental regeneration and tooth replacement, recent evidence supports the deployment of common molecular signals. For instance, sox2 is associated with epithelial stem cells in the dentitions of sharks, bony fishes, reptiles and mammals (7). Similarly, we identified a putative quiescent population of Celsr1 + stem cells in the dental pulp mesenchyme in both Lake Malawi cichlids and the mouse incisor (19). Upon clipping-induced damage to the mouse incisor, this quiescent pool of cells replenished dental mesenchymal stem cells required for accelerated growth (4,20). In Lake Malawi cichlids, slow-cycling BrdU + /celsr1 + cells are found in epithelial and mesenchymal zones of label retaining cells in replacement teeth (19).
In this report, we used snRNA-seq to explore the cellular profiles of replacement teeth and associated tissues in Lake Malawi cichlids. We chose to isolate and sequence single nuclei instead of single cells because we wanted the flexibility of freezing dissected samples before library preparation and we wished to avoid dissociation-induced gene expression associated with standard single cell protocols (21,22). We aimed to (i) characterize the cell types of replacement teeth and associated oral lamina, (ii) compare the transcriptional features of these cell types to those from different types of teeth in mice and human; and (iii) identify the gene expression signature of cells that may contribute to dental regeneration and whole-tooth replacement.

Cellular Heterogeneity of Cichlid Replacement Teeth and Oral Lamina
To identify cell populations associated with replacement teeth in the jaws of Lake Malawi cichlids, we performed single nuclei RNA-sequencing on many dissected replacement teeth and the adjacent soft jaw tissue, containing taste buds and the oral lamina, from Maylandia zebra individuals (see Methods). Replacement teeth, which undergo one-for-one substitution of the functional tooth, are anatomically linked to taste buds located in the adjacent oral epithelium ( Figure 1A). Tastebuds undergo continuous renewal, are co-localized with teeth throughout the oro-pharynx, share genetic loci regulating their co-density and likely share competent epithelium and mesenchyme during development and regeneration (7,23). Because replacement teeth and the soft jaw tissue represent samples of different hardness and composition, we kept them separate for tissue processing and library preparation. Nuclei were FACS sorted using DAPI, libraries were prepared using 10x Genomics kist, and the samples were sequenced using Illumina chemistry on a NextSeq 500. Reads were filtered and aligned to the M. zebra genome using CellRanger. Next, the resulting gene expression matrix was further filtered to remove poor quality data and nuclei were clustered based on their transcriptional profiles in Seurat (24). Seven hundred and thirty nine nuclei were recovered from replacement teeth that met our quality filters with a mean of 326 genes expressed and 303 unique molecular identifiers (UMI) per nucleus. For the soft jaw tissue, 1,296 nuclei were recovered that met our quality filters with a mean of 721 genes expressed and 1,098 UMI per nucleus (Supplementary Figure 3).
Notably, nuclei from the two tissues were transcriptionally distinct from each other, with krt5 and krt15 showing the greatest differences. Nuclei from replacement teeth can be distinguished from those from surrounding soft tissue by krt5 with 91.2% accuracy (Figures 1B,C). krt5 was expressed by 90.4% of nuclei from the soft jaw tissue and 7.57% of nuclei from the replacement tooth (average log fold change = 3.59; Bonferroni adjusted p-value = 7.12e-242). In addition, krt15 could be used to distinguish the two tissues with 81.3% accuracy (Figures 1B,C). krt15 was expressed by 71.4% of nuclei from the soft jaw tissue and 1.35% of nuclei from the replacement teeth (average log fold change = 4.32; Bonferroni adjusted p-value = 3.30e−178). Using the presence of either krt5 or krt15 to classify nuclei resulted in 92.8% accuracy, enabling the potential to separate these two tissues using either of the genes or their combination. This may be important for future experiments wherein samples could be pooled prior to processing and nuclei could then be separated by the presence/absence of these keratin genes. We identified cell types for each tissue type separately through a cluster detection function in Seurat, and then manual annotation of cell type based on cluster marker gene expression.
In the replacement tooth sample, we detected 10 nuclei clusters representing 5 broad cell types: dental epithelial, dental mesenchymal, glial, endothelial and immune ( Figure 1D). Endothelial cells were characterized by greater expression of tie1; a component of the angiopoietin/Tie (ANG/Tie) receptor system that controls vascular remodeling and inflammation (25), ptprb; a gene which encodes receptor-type tyrosine-protein phosphatase beta (VE-PTP), an enzyme essential for blood vessel development (26,27), pecam1; or CD31, a molecule expressed throughout vascular cells whilst being largely concentrated at endothelial cell-cell junctions (28), and robo4; a cell-surface receptor that has been shown to mediate attraction signaling mechanisms via RhoGTPase, which are essential for vascular guidance in vertebrates (29)-along with genes such as tnsa, ebf1 and cdh5 which have diverse roles in vascular biology (30).
Dental epithelial nuclei were marked by increased expression of bicoid-related homeodomain factors pitx1 and pitx2, which FIGURE 1 | (A) Illustration of a sagittal view of the cichlid jaw, original artwork modified from reference (7). The dashed regions represent the tissue collected for our samples, replacement tooth (circle) and adjacent soft jaw tissue (triangle). The legend depicts the color representing each of the oral structures: taste buds in blue, epithelium in purple, functional tooth in magenta, replacement tooth in light pink, ossified tissue (crypt bone) in green, and label retaining cells in orange. (B,C) UMAP plot of the normalized expression of krt5 and krt15 in the replacement tooth and jaw, respectively. (D) UMAP plot of the annotated cell types in the replacement tooth. (E) UMAP plot of the annotated cell types in the adjacent soft jaw tissue. (F,G) Gene markers used to identify cell types in the replacement tooth and jaw, respectively. Average relative expression relative expression of the markers in each cell type is represented by color, red denoting greater relative expression and blue denoting lower relative expression. The size of the point represents the percent of cells in the cell type that express the marker.
are predominantly restricted to the dental epithelium during odontogenesis (31,32) whilst playing a key role in the odontogenic homeobox code for tooth patterning (33); shha, a protein coding gene whose expression has been well documented in stellate reticulum, stratum intermedium, inner, outer dental epithelium and epithelial-derived ameloblasts within the dental architecture (18,34); odam, a odontogenic ameloblast-associated protein (18,35); itgb3b, a gene that encodes integrin beta-3 (β3) or CD61 protein which is predominantly expressed in the developing epithelial tooth germ and adult dental lamina (36); and sema3fb, a member of the semaphorin gene family expressed in the epithelium of developing and adult teeth (37,38).
The glial population of nuclei was identified by expression of tcf7, tmtc2b, and slit3; genes which have been deemed important in maintaining neurogenesis, neural homeostasis, myelin formation in the peripheral nervous system, and provide support to neurons. Immune nuclei were marked by expression of hbb, wdr73, hbe1a, and hbe1b. Finally, dental mesenchymal nuclei were marked by increased expression of plod2, Procollagen-Lysine,2-Oxoglutarate 5-Dioxygenase 2, a gene which is involved in the regulation of dentinogenic differentiation of dental pulp stem cells (39,40); sox5, which belongs to the SRY-related HMG-box family of transcription factors having a key role in mesenchymal stem cell differentiation (41) and predominantly expressed in the dental mesenchyme (42); mmp16; a member of matrix metalloproteinase family whose members are involved in extracellular matrix reorganization in various cellular processes with mmp16 particularly being differentially expressed in dental follicles (43); bmp6, smad6, sall1a, smpd3 and dkk1a, which are well established pre-odontoblast makers (18) (Figure 1F).
Pathway enrichment analysis was performed on upregulated differentially expressed genes (DEGs) marking each cell type in the replacement tooth (Supplementary Separately, nuclei from the soft jaw tissue formed twelve clusters that represent six major cell types: oral epithelium, mesenchymal cells, pigmented cells, immature taste buds, mature taste buds, and immune cells ( Figure 1E). Mature taste bud nuclei were primarily defined by expression of calb2a (calretinin); a calcium signaling regulator, a marker which has been previously shown to be predominantly expressed in mature cichlid taste buds (7), trpm5; a gene which encodes a transient calcium channel receptor of the TRP subfamily highly expressed in the mammalian tongue taste buds where it plays a key role in perception of taste (44), avil; a gene which encodes a calcium regulated actin binding protein highly expressed in primate circumvallate taste buds (45), hcn; which encodes hyperpolarization-activated cyclic nucleotidegated (HCN) channels which are integral membrane proteins shown to be present in taste buds where they contribute to sour perception (46,47), and col2a1; which encodes type II collagen. Immature taste bud nuclei exhibited increased expression of sox2, which distinguishes progenitor taste bud cells from adult taste buds (7), chl1b; a gene which encodes a cell adhesion molecule expressed during taste bud development in cichlids (7), kif26ba, spire2, and calcb genes which are associated with budding and tube formation (48). Epithelial nuclei were marked by the expression of keratins; krt5 and krt15, integrin; itga6b, epcam; a epithelial adhesion molecule, and prominent epithelial marker pitx1 which has been demonstrated to be expressed in epithelial and mucosal surfaces across species along with tp63, a prominent epithelial stem cell marker (49). A population of pigmented cells was primarily characterized by the expression of mitfa; which regulates the function of pigment producing cells; melanocytes (50), mrc1 (or melanocortin receptor-1); a membrane surface receptor predominantly expressed by melanocytes (51), slc43a2a; associated with melatonin metabolism (52), and csf1ra; colony stimulating factor receptor 1-known for its role in zebrafish pigmentation (53).
Immune cell nuclei were identified by expression of tox2; a transcription factor which drives T cell function and development by regulating chromatin modeling (54), sh2d3ca (also known as Shep1); gene which encodes mediator proteins controlling cell signaling, deemed essential for immune response via the BCAR1-CRK-RAPGEF1 pathway (55) and B-cell function, along with genes prkd3, skap1, and jak3. Finally, mesenchymal cells had increased expression of genes col6a3, col5a1, col6a2, col6a1 which encode collagen V and VI localized within mammalian gingival mucosa (56), mesenchymal stem markers sfrp2, smoc1, piezo2, along with epha3 and prrx1b which contribute to epithelial to mesenchymal transition in tumors (57) ( Figure 1G).

Deep Homology of Gene Expression Within Dental Cell Types Across Vertebrates
We aimed to compare the transcriptional profiles of cell types within the dentitions of multiple vertebrate species. Comparison of single cell RNA-seq datasets is an area of active research (24, 58, 59) but we found that existing tools do not easily accommodate species as distantly related as fishes and mammals, sequenced with different approaches to different depths per cell or nucleus. Therefore, we explored an approach that compares differentially expressed genes characteristic of cell types found by independent study. High quality dental single cell RNA-seq datasets from mice and humans have been published recently, facilitating a transcriptional comparison that spans hundreds of millions of years of evolutionary distance. From Krivanek et al. (18), we gathered scRNA-seq datasets originating from the mouse incisor (MI), mouse incisor and mouse molar (MIM), and human molar (HM). We computed DEGs from the dental cell types identified by Krivanek and colleagues and then compared mammal dental cell type DEGs to those from cichlid replacement teeth (CT) and the adjacent soft jaw tissue (CJ). The heat map presented in Figure 2A illustrates the general approach and results from this analysis.
As expected, the greatest degree of overlap in DEGs was between cell types in the two mouse datasets, MI and MIM. There is lower overlap of DEGs in the cichlid cell types compared with the mammalian datasets and with human compared to mouse. This is expected due to the evolutionary relationships of species involved. To identify the mammalian dental cell type(s) with the closest overlap compared to cichlid cell types, we subset by mammalian dataset and created z-scores for each cichlid cell type (Figures 2B-D). DEGs for endothelial cells for the mammalian datasets match well and distinctly with cichlid endothelial nuclei. The DEGs for cichlid mesenchymal nuclei match most closely with pulp cells, PDL, odontoblasts, and alveolar osteocytes. Cichlid immune cell nuclei match most closely with immune cells, macrophages, Lyve1 macrophages, lymphocytes, and innate leukocytes. Cichlid epithelial nuclei show the most distinct overlap in the mouse incisor, where they overlap to the greatest degree with ameloblasts. The cichlid cell type we annotate as glia tends not to match with expected corresponding cell types in the mammal datasets.
Next, we sought additional specificity in the matching of mammalian dental cell types to those in the cichlid dataset. We found the top 100 upregulated DEGs exclusive to each mammal cell type per dataset. Then, we calculated the cumulative expression of these markers in each of the cichlid cell types and scaled by row ( Figure 2E). This expression score was typically the highest for corresponding cell types across data sets. For example, the mammalian cell types with the highest expression score in the mesenchymal clusters of cichlids were "pulp cells", "PDL", and "odontoblasts." Similarly, cichlid immune clusters had the highest expression score with cell types "immune cells, " "macrophages, " and "lymphocytes." The cichlid epithelial clusters had the highest match with mammalian "epithelial" cell types.
We next investigated the identity of mammal cell type markers with conserved expression in cichlid cell types. Here we present the top five of these conserved vertebrate dental cell type markers: It is notable that between 13 and 60% of upregulated DEGs exclusive to mammalian dental cell types were observed to be expressed in corresponding cichlid cell types. Over all analyses, cichlid glial cells show the lowest degree of conservation. This is an indication that this cell type has evolved between mammals and fishes, but also may indicate that we require more complete sampling of this cell type in cichlid teeth. Conversely, endothelial cells show the highest degree of conservation between cichlids and the mammalian species examined. Taken together, we find deep homology in the transcriptional profiles of dental cell types for vertebrate species that span hundreds of millions of years of evolutionary divergence.

The Transcriptional Signature of Stem-Like Cells in Regenerating Teeth
Having demonstrated conserved gene expression profiles amongst vertebrate dental cell types, we next asked whether we could identify a transcriptional signature of stem-like cells in vertebrate regenerating dentitions. One traditional approach to identify quiescent and cycling stem-like cells in tissue is to assay for the presence of specific molecular markers. This approach is typically aided by anatomical context (e.g., from histological sections). By contrast to the traditional approach, single-cell and single-nucleus RNA-seq datasets are much more sensitive and do not benefit from anatomical context. It is trivial to sort single cells in their clusters and find those expressing dental stem cell markers (e.g., Sox2, Gli1, Celsr1). We did so and observed cells expressing markers of interest in nearly all clusters, in many cases in cells from differentiated cell types. We reasoned that most, if not all, markers of stem-like cells are pleiotropic and therefore their expression in single cells alone is a poor indicator of a stem-like state. Therefore, we combined analysis of stem-like markers in cells with a computational approach to estimate the developmental potential of a cell. Below, we illustrate this strategy with Celsr1, a dental stem marker previously described in both cichlids and mice (19).
CytoTRACE (60) outperforms other computational approaches to identify quiescent or stem-like cells and is  (Figures 3A-D  and Supplementary Figure 6).
As proof of concept in dental cells, we applied CytoTRACE to clusters along an ameloblast differentiation trajectory annotated by Krivanek and colleagues (18). We found strong alignment between CytoTRACE score and the predicted differentiation state of the cell (Figure 3A). For instance, mature ameloblasts have low CytoTRACE scores, Mki67 + cycling cells had high CytoTRACE scores and Sfrp5 + progenitor cells were intermediate, in the range of 0.6-0.75.
We then applied CytoTRACE to all cichlid and mammalian dental datasets and subset cells into celsr1/Celsr1+ and celsr1/Celsr1-populations. In all dental datasets, Celsr1 + cells were found not only in a wide variety of cell types, but also with a wide range of CytoTRACE scores (Figures 3B-D and  Supplementary Figure 6). Following previous observations that CytoTRACE scores correlate with the differentiation state of the cell, we generated three equally sized bins of Celsr1 + cells based on CytoTRACE score: low, medium, and high. To enhance statistical power, we combined the following datasets from vertebrates with regenerative dental capabilities: CT, CJ, and MIMI.
We performed differential gene expression analysis of Celsr1 + cells by CytoTRACE bin (Figure 3E) and found a total of 2,616 DEGs that met thresholds described in the Methods. Of those, 1,602 DEGs were found in the high bin, 78 were found in the medium, and 936 were found in the low. In the high bin 97% of DEGs were upregulated, while in the low bin, 83% of DEGs were downregulated. High CytoTRACE scores are defined by high transcriptional diversity, which is also associated with high gene counts, while the converse is true of low CytoTRACE scores (Supplementary Figure 7). Greater gene counts in general makes it more likely that any given gene will be upregulated and this likely explains the observed relationship between CytoTRACE bin and the direction of effects for DEGs.
As expected, we observed classic proliferation marker genes to be high-bin DEGs, such as Mki67  (Figure 3F).
Our approach of pairing a molecular marker (i.e., Celsr1) with a metric of developmental potential for each cell identifies transcriptional signatures of cells from different differentiation states (i.e., bins). To gauge the degree of specificity of these transcriptional profiles for Celsr1 + cells, we carried out an identical analysis for Celsr1 − cells. We observed 12,323 DEGs across high, medium, and low CytoTRACE bins. By inspection, we found DEGs present across the bins of Celsr1 + cells, but not Celsr1 − cells. Accordingly, 1,271 DEGs (862 in the high, 68 in the medium, and 341 in the low bin; Supplementary Table 5) were specific to the Celsr1 + cell population. These data indicate that Celsr1 + cells, binned by scores of developmental potential, contain transcriptional profiles distinct by bin, and distinct from cells not expressing Celsr1.
To assess this point directly, we identified DEGs between Celsr1 + and Celsr1 − cells in each of the high, medium and low CytoTRACE bins (Figure 4A). Notably, the vast majority (4,799 out of 5,972; 80%) of total DEGs were upregulated in Celsr1 + cells (Supplementary Tables 6, 7). Functional enrichment of DEGs from this direct comparison (Figure 4B) are similar to the analysis by bin for Celsr1 + cells only (Figure 3F). We note that classical markers of proliferation (Mki67, Cdk4, Pcna, Mcm2) are differentially expressed in this direct comparison of Celsr1 + vs. Celsr1 − cells for the high bin; and markers of quiescence (Sox9, Notch2, TNC, Ctnnd2) are likewise differentially expressed in the comparison of Celsr1 + vs. Celsr1 − cells for the medium bin. Taken together, we have identified distinct transcriptional profiles of Celsr1 + cells sampled from vertebrate teeth and binned by cellular developmental potential, and we have shown that these transcriptional profiles are largely distinct from Celsr1 − cells (Figure 4).

DISCUSSION
We carried out single nucleus RNA-seq from cichlid fish replacement teeth and oral lamina and compared these cell populations to those from mammalian teeth. With this work we made two main discoveries.
First, we have extended the concept of deep homology of dental gene expression (12,67) to that of cell populations within teeth. Across hundreds of millions of years of vertebrate evolutionary history, major dental cell types (e.g., mesenchymal, epithelial, immune, endothelial) exhibit strong conservation of gene expression. Dental glial cells, by contrast, appear to have undergone gene expression evolution. We offer a few caveats and ways forward from this analysis. Cichlid replacement teeth are small and difficult to sample. We identified a method to extract, digest tissue and access nuclei from replacement teeth, but improvements can be made as single cell protocols for hard tissues develop. Improved sampling will likely lead to improved data quality from cichlid dental nuclei-this means we should be able to sequence with greater depth and diversity. Although methods exist to compare cell types from different experiments (24,58,59), these existing methods could not account for the evolutionary scale and experimental variance of the cell types we compared. We thus invented a strategy to compare cell types across species and samples by comparing the differentially expressed genes within each dataset. This worked well for our purposes, but as the scope of single-cell and single-nucleus datasets increases for vertebrate teeth, more robust comparative methods will need to be developed for deeper biological insights to be achieved.
Having demonstrated conservation of gene expression amongst vertebrate dental cell types, we next asked whether we could ascertain a common signature of stem-like cells in vertebrate regenerating teeth. The approach we chose combines selecting cells based on their expression of a molecular marker (in this case Celsr1/celsr1) with analysis of a cell's developmental potential. In the mouse incisor, Celsr1 marks a population of quiescent cells that are mobilized for accelerated growth after clipping; in cichlid replacement teeth, celsr1 co-localizes with label-retaining cells at the base of the tooth, at the tooth tip and in the cervical loops (19). However, Celsr1 (similar to other markers of stem-like cells) is pleiotropic and single-cell techniques have unprecedented sensitivity such that any chosen marker is likely to be expressed in cells from many clusters and in many differentiation states.
We therefore used CytoTRACE (60) to measure, in an unbiased manner, the differentiation state of every cell in available datasets. We first assessed this approach using annotated single cells along an incisor epithelial differentiation trajectory (18) and then identified the transcriptional signature of Celsr1 + cells, from vertebrate regenerating teeth, that are putatively (i) quiescent, (ii) cycling and (iii) differentiated.
Correspondingly, putatively quiescent Celsr1 cells differentially expressed other known markers of quiescence (Sox9, Notch2), while those in the putative cycling state differentially expressed markers associated with mitosis and the cell cycle (Mki67, Cdk4); finally, those in the differentiated state expressed markers of angiogenesis, a known pleiotropic function of Celsr1. Perhaps less expected, these transcriptional signatures are not only statistically distinct from one Celsr1 + CytoTRACE bin to another, but are also statistically distinct in Celsr1 + vs. Celsr1 − cell populations. Our discovery of Celsr1 + transcriptional profiles will aid future experiments in targeting specific populations. For instance, Sox9 and Cdk4 could be used to specifically target quiescent vs. cycling Celsr1 dental stemlike cells, respectively, with a transgenic tag or antibodies. In conclusion, our results highlight the advantages of an explicitly comparative strategy to solve fundamental problems in dental and regenerative biology.

Subjects
Cichlids were housed in the Physiological Research Laboratory at Georgia Institute of Technology (Atlanta, GA) and all experiments were performed according to institutional animal care and use policies. A total of 10 adult male Maylandia zebra individuals (IACUC protocol no. A100028) were used for single-cell experiments. Environmental conditions of aquaria were similar to those of the Lake Malawi environment; subjects were maintained on a 12-h:12-h light:dark cycle with full lights on between 8 am and 6 pm Eastern Standard Time (EST) and dim lights on for 60 min between light-dark transition (7 am−8 am and 6 pm−7 pm EST) in pH=8.2, 26.7 • C water and fed twice daily (Spirulina Flake; Pentair Aquatic Ecosystems, Apopka, FL, U.S.A.). All tanks were maintained on a central recirculating system.

Tissue Collection
Study animals were euthanized by tricaine methanesulfonate (MS-222) immersion and stored on ice until dissection. All animals were euthanized and dissected the day before nuclei preparation. Dissections of the first 10 individuals took place the afternoon before nuclei preparation and yielded a pooled sample of around 300 regenerating teeth along with soft tissue samples from upper and lower oral jaws. Oral jaws were separated from the head (Supplementary Figure 1A) and the soft tissue labial to each jaw was shaved away, exposing the jawbone underneath (Supplementary Figures 1B,C). To harvest regenerating teeth, an incision was made with a scalpel along the length of the exposed jawbone just under the line of surface teeth. The tip of the scalpel was then inserted into the cut and used to pry the outermost layer of bone up and off, exposing the tooth crypt containing 10-20 regenerating teeth. Teeth from all 10 animals were removed using forceps and pooled in 1 mL of 1x HBSS at −20 • C overnight. Soft tissue was saved from both jaws of a single individual and stored at −20 • C overnight. The following day, soft tissues were divided into four roughly 50 mg segments, which were processed separately and then pooled at the end of nuclei preparation. Each tissue segment was placed in 1 mL each of 1x HBSS (Sigma-Aldrich H4385) and briefly stored on ice (∼30 min) while reagents were prepared for nuclei preparation.

Cell Preparation
All steps were performed at 4 • C in a cold room. Nuclei preparation methods were based on the 10x Genomics Demonstrated Protocol for lysis and washing of single nuclei. To this protocol the following modifications were made: 1x HBSS was used in place of HEB; tissues were placed in lysis buffer for 10 min (to account for a smaller tissue mass) and then homogenized with disposable pellet pestles (Sigma-Aldrich Z359947) before halting lysis with HBSS; the tissue was suspended in 300 µL of wash buffer and passed through a 30 µm MACS SmartStrainer once before proceeding to FACS.

Fluorescence Activated Cell Sorting
Filtered samples were stained with DAPI at a concentration of 0.1 µL DAPI per 100 µL of suspension and sorted using the BD FACSAria Fusion. Gates were set conservatively to collect DAPI+ singlet nuclei (Supplementary Figure 2) and nuclei were sorted into 200 µL of wash buffer. Total volume in each collection tube was recorded, collected nuclei were centrifuged at 300 rcf, 4 • C for 10 min and supernatant was removed, leaving nuclei suspended in 50 µL of wash buffer. Event counts registered during sorting were used to estimate final cell suspension concentrations: from upper jaw soft tissue, 20,668 nuclei were collected for a final concentration of 413 nuclei/µL; from lower jaw soft tissue, 20,181 nuclei were collected for a final concentration of 403 nuclei/µL; from regenerating teeth 14,700 nuclei were collected for a final concentration of 294 nuclei/µL.

Library Preparation and Sequencing
Libraries were prepared for sequencing by the Genome Analysis Core at Georgia Tech using the 10x Chromium V3 Single Cell Gene Expression Solution protocol and reagent kit (10x Genomics 1000092) with a targeted cell recovery of 3,000 nuclei per sample. Sequencing was performed by the Molecular Evolution Core at Georgia Tech; libraries were sequenced on an NextSeq 500 (Illumina SY-415-1001). Soft tissue libraries were pooled and sequenced in one lane; tooth libraries were sequenced in a separate lane.

Pre-Processing
FASTQ files were processed with Cell Ranger version 3.1.0 (10X Genomics) (68), including alignment, filtering, barcode counting, and UMI counting. We obtained the Maylandia zebra Lake Malawi cichlid genome assembly (69) from NCBI RefSeq (assembly accession: GCF_000238955.4; Assembly name: M_zebra_UMD2a) and annotations from Ensembl version 98. Because nuclear RNA contains intronic sequences, reads were mapped to the M. zebra reference genome (including intronic regions) using a splice-aware alignment algorithm (STAR) within Cell Ranger using the count command with the default options. During alignment Cell Ranger filtered out UMIs that are homopolymers, contain N, or contain any base with a quality score <10. These steps were followed separately for the replacement teeth, the upper and lower jaw samples. The upper and lower samples were aggregated into one sample, hereby referred to as the soft jaw sample, using cellranger aggr. For all the samples, Cell Ranger generated a filtered featurebarcode matrix containing expression data for a total of 28,622 features (corresponding to genes). The number of total barcodes (corresponding to putative nuclei) were 821 and 1,381 for the replacement teeth and soft jaw samples, respectively.

Quality Control and Clustering of Cells
Many software suites exist for single cell RNA-seq analysis. Seurat v3 was chosen for its wide use in the field, robust documentation, control given to users over the parameters used, and flexibility in conversion to other data types. The filtered feature matrices from Cellranger were imported into Seurat. The matrices were normalized using the default method, which normalizes feature counts in a cell by the total number of counts for all features in the cell. Potential dead cells, doublets, and multiplets were filtered out by removing cells with >2,500 feature counts and 5% mitochondrial transcripts. Different minimum thresholds for feature counts were used for replacement teeth and jaw samples since these were different library preparations with distinct quality parameters. Thresholds employed were 125 and 300 feature counts for the replacement teeth and soft jaw samples, respectively. Next, features with high cell to cell variation were identified and the data was scaled prior to dimensionality reduction. PCA was used to reduce dimensionality, the first 30 principal components were calculated since all were determined as significant from JackStraw plots. A shared nearest neighbor graph was constructed using k = 20, from these Seurat-determined clusters of cells.

Biological Interpretation of Clusters
After clustering cells, DEGs for each cluster were examined in order to determine their biological identities. DEGs were found using the FindAllMarkers function from Seurat using the default thresholds, including the default logfc.threshold = 0.25 and min.pct = 0.1. DEGs were genes that had a Bonferroni adjusted p-value < 0.05. These DEGs from our clusters were compared to known markers for dental cell types with the most weight given to DEGs with the strongest adjusted p-values. Human orthologs for cichlid gene names were found using the BioConductor package in R and from a phylogeny-based approach using the TreeFam database (70). Next, DEGs were calculated by cichlid cell type and these are the cluster DEGs referred to throughout unless otherwise noted. These DEGs were again compared to known dental cell type markers to ensure accuracy.

Acquisition of Mouse and Human Dental Datasets
Single cell RNA-seq data from Krivanek et al. (18) was acquired from the author's website: http://pklab.med.harvard.edu/ruslan/ dental.atlas.html. The datasets used include the "General mouse incisor atlas", "Comparative map of mouse incisor and molar teeth", and "Human adult and growing molar teeth". The counts matrices were used to create Seurat objects and the spatial coordinates of cells from the author's website were imported into the objects. The function FindAllMarkers from Seurat was used to find cluster DEGs for each cell type as annotated by the authors.

Comparison of DEGs Between Cichlid, Mouse, and Human Dental Datasets
DEGs for each cell type within the mouse and human datasets were found using the FindAllMarkers function in Seurat using the default thresholds. Next, the DEGs from each cluster from each dataset were compared to one another in a pairwise manner. The number of DEGs that were consistent in direction, either both upregulated or downregulated, in the clusters was recorded. This number representing the overlap of DEGs was then normalized by the total number of DEGs in both clusters. For pairwise clusters that included a cichlid cluster in the comparison, a correction factor was applied to account for cichlid gene names where human orthologs could not be found, likely because of evolutionary divergence. The correction factor is equal to the number of cichlid cluster DEGs with human orthologs divided by the total number of cichlid cluster DEGs. The correction factor was roughly ∼80% for all cichlid clusters. This process was repeated for pairwise datasets with colors scaled by column in order to determine which mouse or human dataset had the greatest similarity with the cichlid clusters.

Expression of the Top 100 Unique Mammalian Cell Type DEGs
As another measure to ascertain the degree of homology between cell types in vertebrate dentition, we found both the magnitude of the expression of the top 100 unique mammalian cell type DEGs and the number of those expressed in each cell type in cichlids. For each cell type, we found DEGs with the most significant Bonferroni adjusted p-values that were additionally not a DEG for another cell type within each mammalian dataset. Next, the relative expression of these were found in the cichlid data sets using the ScaleData function in Seurat with the default parameters. The relative expression in the cichlid clusters was then summed for each mammalian cell type. Additionally, the identity of genes from the top 100 unique mammalian cell type DEGs that were expressed in each cichlid cell type was determined and the top 5 were noted.

CytoTRACE
CytoTRACE [Cellular (Cyto) Trajectory Reconstruction Analysis using gene Counts and Expression] (60) version 0.3.3 was used to determine the differentiation state of each cell. Many lineage trajectory algorithms exist for single nuclei data, but many rely on a priori knowledge of the starting point, intermediate states, or end point. In addition, other approaches have difficulty distinguishing between quiescent cells and mature cell types. CytoTRACE overcomes these weaknesses and has been validated on 33 scRNA-seq datasets. CytoTRACE determines the differentiation state of a cell based on transcriptional diversity, the number of genes expressed in the cell. More differentiated nuclei tend to restrict chromatin accessibility resulting in a decrease in transcriptional diversity. The raw transcript counts for each gene and cell were supplied to CytoTRACE using the default settings. CytoTRACE supplied a score for each cell ranging from 0 to 1. A score of 0 indicates a more differentiated nucleus and a score of 1 indicates a less differentiated nucleus. CytoTRACE ranks the differentiation state of cells relative to the others in the input, therefore CytoTRACE was performed on each dataset separately.

Comparison of DEGs Between Cells
Binned by CytoTRACE Score in Celsr1 + and Celsr1 -Cells First, CytoTRACE was run on the data to find the differentiation state of the cells. The cells in the bottom third quantile of CytoTRACE scores were put in the low bin, the middle quantile created the medium bin, and the top third quantile formed the high bin. Next, FindAllMarkers was run with the default parameters excluding the following: logfc.threshold = 0, and min.pct = 0.01. Then the percent difference in expression was calculated as pct.1 -pct.2 DEGs that mark the differentiation states of Celsr1 + and Celsr1 − cells are genes that pass the following criteria: Bonferroni adjusted p-value <0.05, absolute average logFC >0.25, absolute percent difference between bins >5%, and average logFC and percent difference in the same direction of effect (ie positive average logFC and positive percent difference). The DEGs found in corresponding bins Celsr1 + and Celsr1 − were compared. The average logFC values were subtracted (È) as well as their percent differences (Ω). DEGs that mark differentiation states of Celsr1 + , but not Celsr1 − had to pass these additional criteria: | È| > 0.25, | Ω | > 5%, and Èand Ω in the same direction of effect. Finally DEGs present in both Celsr1+ and Celsr1-of opposite direction of effect were not filtered (ie upregulated in one population and downregulated in the other).

Detection of DEGs Between Celsr1 + and Celsr1 -Cells by Bin
Instead of comparing the DEGs between CytoTRACE bins in Celsr1+ to those in Celsr1-, an additional, more direct approach was taken. After applying the same approach to create CytoTRACE bins outlined above on Celsr1+ and Celsr1-cells independently, DEGs were found between Celsr1+ and Celsr1in each bin: high, medium and low. This was done using the FindMarkers function from Seurat with the default thresholds, including the default logfc.threshold = 0.25 and min.pct = 0.1.

Pathway Enrichment Analysis
To gain more biological insight into lists of DEGs, pathway enrichment analysis was performed using ToppGene (71). GO terms from categories including Biological Process, Molecular Function, and Cellular Component are ranked by their overrepresentation in the query list. Additionally, p-values and p-values adjusted for multiple comparisons are returned. We compared the observed vs. expected number of genes in the query list in the top 5 Biological Process GO terms. The observed values are the number of genes in the query list in the GO term and the expected values are the number of genes in the query list multiplied by the number of genes in the GO term divided by the number of genes in the genome.

ETHICS STATEMENT
The animal study was reviewed and approved by Georgia Tech Institutional Animal Care and Use Committee (IACUC) protocols (permit A100003).

AUTHOR CONTRIBUTIONS
JSt conceived of the project. CP and JSt carried out the experiments. GG analyzed the data with significant intellectual contributions from TM and JSt. GG, TM, and JSt wrote the manuscript. All authors contributed to the article and approved the submitted version.