Cross-Species Analysis of Single-Cell Transcriptomic Data

Shafer, Maxwell E. R.

doi:10.3389/fcell.2019.00175

PERSPECTIVE article

Front. Cell Dev. Biol., 02 September 2019

Sec. Evolutionary Developmental Biology

Volume 7 - 2019 | https://doi.org/10.3389/fcell.2019.00175

Cross-Species Analysis of Single-Cell Transcriptomic Data

ME
Maxwell E. R. Shafer ^1,2^*

1. Biozentrum, University of Basel, Basel, Switzerland
2. Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, United States

Abstract

The ability to profile hundreds of thousands to millions of single cells using scRNA-sequencing has revolutionized the fields of cell and developmental biology, providing incredible insights into the diversity of forms and functions of cell types across many species. These technologies hold the promise of developing detailed cell type phylogenies which can describe the evolutionary and developmental relationships between cell types across species. This will require sampling of many species and taxa using single-cell transcriptomics, and methods to classify cell type homologies and diversifications. Many tools currently exist for analyzing single cell data and identifying cell types. However, cross-species comparisons are complicated by many biological and technical factors. These factors include batch effects common to deep-sequencing approaches, well known evolutionary relationships between orthologous and paralogous genes, and less well-understood evolutionary forces shaping transcriptome variation between species. In this review, I discuss recent developments in computational methods for the comparison of single-cell-omic data across species. These approaches have the potential to provide invaluable insight into how evolutionary forces act at the level of the cell and will further our understanding of the evolutionary origins of animal and cellular diversity.

Introduction

Single-cell RNA sequencing has become a powerful and popular tool, yielding rich and informative cell-type atlases of many tissues, and even whole organisms (Cao et al., 2017; Haber et al., 2017; Achim et al., 2018; Zeisel et al., 2018; Sebé-Pedrós et al., 2018b). These experiments have allowed the characterization of hundreds of poorly understood cell types, and identification of previously unknown cellular diversity across multiple species (La Manno et al., 2016; Montoro et al., 2018; Plasschaert et al., 2018). These datasets allow us to ask questions about the origins of cellular diversity, and the evolutionary mechanisms which have shaped cellular form and function. An ultimate goal of these experiments will be to generate cell type phylogenies, describing the evolutionary relationships between cell types (Kin, 2015; Arendt et al., 2019). However, relating information obtained from different sources and different model and non-model organisms is confounded by many technical and biological factors that make comparisons of single-cell data difficult (Marioni and Arendt, 2017; Stuart and Satija, 2019). These include poorly understood forces shaping transcriptome evolution, and complications in assigning orthology and functional conservation between genes across species.

Much of our understanding of cell biology originates from characterizing cells by their functions, gene expression, and lineage relationships (Zeng and Sanes, 2017). Molecular distinctions between cell types, such as protein or gene expression, have become the de facto method for categorizing cells, because it is convenient, easily measured, and comparable across models and systems. With recent advances in sequencing, microfluidics, and nano technologies, it is also now possible to profile the transcriptomes of thousands or even millions of cells in a single experiment (Cao et al., 2017; Underwood et al., 2017; Raj et al., 2018; Paolillo et al., 2019). Computational tools have been developed to interrogate these datasets, identifying clusters of cells with similar patterns of gene expression (Andrews and Hemberg, 2018). These clusters are interpreted as distinct cell types, and these methods have done a remarkable job at matching classification systems based on morphology and function (Marioni and Arendt, 2017; Butler et al., 2018; Moussa and Mãndoiu, 2018; Deng et al., 2019).

Though there is debate about whether these transcriptional distinctions are reliable indicators of cellular types or diversity, single cell sequencing technologies are nonetheless very powerful and have the potential to be used to understand evolutionary relationships between cell types across species. Indeed, these technologies have recently been used to compare embryonic brain development in mice and humans, and the evolution of neuronal cell types in reptiles (Pollen et al., 2015, 2019; La Manno et al., 2016; Tosches et al., 2018). Many datasets are also being independently generated from diverse phyla (Achim et al., 2018; Plass et al., 2018; Siebert et al., 2018; Sebé-Pedrós et al., 2018a, b; Ryu et al., 2019).

These diverse datasets necessitate methodologies which can reconcile the technical and biological batch effects inherent in single-cell sequencing technologies. These tools will ideally be able to identify both homologous and divergent cell types between species, and the transcriptional mechanisms involved in their evolution (Marioni and Arendt, 2017). Here, I offer a perspective on the current state of the field of evolutionary cellular transcriptomics, technologies and platforms. This review will specifically focus on computational tools and approaches for combining and comparing single-cell datasets across species.

Single-Cell Sequencing and Single-Cell Clustering Approaches

Many solutions have been developed for separating, barcoding, and individually labeling cells (Jaitin et al., 2014; Picelli et al., 2014; Soumillon et al., 2014; Svensson et al., 2018). Advances in microfluidic and microwell technologies have offered an incredible increase in throughput, from hundreds of cells to thousands or millions of cells. These technologies involve either encapsulating cells in micro-fluidic droplets, or placing cells individually in microwells, greatly increasing our ability to observe heterogeneity and rare cell types (Islam et al., 2014; Klein et al., 2015; Macosko et al., 2015; Zheng et al., 2017). Techniques such as Sci-RNA-Seq further increase the number of cells analyzed by combinatorically barcoding cells during isolation (Cao et al., 2017). These techniques increase cell breadth at the expense of sequencing depth, which is thought to more reliably identify cellular heterogeneity compared to high-depth sequencing of fewer cells (due to sequencing costs), such as in Smart-seq2 (Picelli et al., 2014).

With the advent of single-cell sequencing experiments numbering in the thousands to millions of cells, sophisticated approaches were needed to deal with statistical challenges in the analysis of the high dimensionality of such datasets. I will briefly describe the main steps taken by the popular single-cell genomics toolkit Seurat (Butler et al., 2018). Further information on alternative methods are reviewed elsewhere (Bacher and Kendziorski, 2016; Stuart and Satija, 2019). Many of these packages produce analogous outputs (cluster annotations) which can then be compared across species using the techniques reviewed in the following sections. Initially, the high dimensionality of the datasets are reduced by both limiting the genes under consideration – to so called “highly variable genes,” those which contribute strongly to cell-to-cell variability – and through projection of the data into lower dimensional space using PCA (steps 1–4, Figure 1A; Butler et al., 2018; Yip et al., 2018). The most recent clustering algorithms employ graph-based methods for defining clusters after PCA based on modularity and density of cells within k-nearest neighbor graphs, grouping cells which are mutually close to each other in gene expression space (step 5, Figure 1A; Bacher and Kendziorski, 2016). tSNE or UMAP is used for visualization of clusters, which collapses higher dimensional variability into either 2 or 3 dimensions (step 6, Figure 1A; van der Maaten and Hinton, 2008; Becht et al., 2019).

FIGURE 1

Accounting for Experimental and Biological Batch Effects

Comparing and contrasting single-cell datasets will allow for testing the reproducibility of observed biological phenomena, or identification of additional cell type heterogeneity by combining multiple datasets into larger cell-type atlases (Butler et al., 2018; Haghverdi et al., 2018). Comparisons of pharmacological, genetic, and experimental manipulations across different experiments can identify particular and specific gene expression effects and perturbations of cellular states like those observed for disease-associated microglia (Haber et al., 2017; Keren-Shaul et al., 2017; Johnson et al., 2018). Finally, cross-species comparisons of cell types within specific tissues will allow translation of knowledge between model and non-model systems and may suggest evolutionary relationships between cells types both within and between species for the generation of cell-type phylogenies (Marioni and Arendt, 2017).

However, technical batch effects can be introduced at every experimental step, from the cell dissociation procedure, isolation and barcoding, sequencing, and analysis (Bacher and Kendziorski, 2016). In addition to species of origin, biological batch effects caused by differences in genetic background, age, and sex also need to be considered. Several groups have generated computational tools to deal with batch effects specific to single-cell data. These approaches take lessons from the comparison of bulk RNA-sequencing experiments, but have been improved to be able to address the high-heterogeneity of single-cell data (Haghverdi et al., 2018).

Comparing Cell Types Across Species

Species-specific single-cell datasets can either be analyzed and annotated separately or combined into a single analysis/annotation step. Separate analysis requires cell types to be cross-annotated (typically by hand) but preserves intra-dataset heterogeneity (Figures 1B,C). Combined analyses increase the number of cells used for clustering, allowing identification of additional heterogeneity and rare cell populations. However, it is more complex and computationally intensive, and may obscure species-specific cell types (Figure 2). Combined analyses “batch-correct” the underlying gene expression data, such that the expression levels of genes within cells from each species resemble each other (Haghverdi et al., 2018). In separate analyses, these batch-effects can persist, affecting comparisons and annotations.

FIGURE 2

In one recent publication, a “gene-specificity index” was used to calculate cross-species pairwise correlation between cell clusters (Tosches et al., 2018). Using a specificity index resolves platform- and species-specific differences in expression quantification, and instead relies on whether a given gene is specific to a cell cluster, or broadly expressed across all cell types (Dunn et al., 2013; Molnar et al., 2013; Kryuchkova-Mostacci and Robinson-Rechavi, 2016). For Tosches et al. (2018) within a set of cell types (C), the specificity index (s_g,c) of a gene (g) for a cell type (c) is defined as the ratio between the level of expression of g within c (g_c) and the mean expression of g across C (Figure 1B). The Pearson-correlation of cell type gene specificity indices can then be calculated, identifying correlated clusters across datasets (red boxes, Figure 1B). The authors used this analysis to compare the pallium, hippocampus, and cortical cell types between turtles, lizards, and mammals. They discovered that mammalian interneuron cell-types were ancestral to all amniotes, but that the mammalian neocortex is largely composed of lineage specific cell types (Tosches et al., 2018).

The previous approach requires cell types to be matched between species by hand, before correlations are calculated. Alternatively, random forest machine learning (RFML) can unbiasedly assign cluster matches across datasets (Breiman, 2001; Denisko and Hoffman, 2018). This has been used to assign cell types across developmental timescales and platforms in the zebrafish habenula, and mouse retina, allowing identification of additional heterogeneity, and differences between larval and adult cell types (Shekhar et al., 2016; Pandey et al., 2018). First, an algorithm is trained to predict the cell types of Species A based on the gene expression matrix generated by single-cell sequencing (step 1, Figure 1C). This produces a set of decision trees, each of which assigns cells to cell types, and which are used to generate a consensus prediction for each cell based on its gene expression signature. This decision forest can then be used to predict the Species A cell types that each of the cells from Species B most resembles. The result of such a comparison is a confusion matrix, which represents the percentage of cells from each cluster in Species B that resemble each cluster from Species A (Figure 1C).

Computational Integration of Single-Cell Datasets

Even assuming clusters are correctly matched across datasets, comparative analysis of cell transcriptomes remains a difficult task due to batch effects (Stuart and Satija, 2019). Computational integration of datasets allows for unified downstream analysis, however, several factors must be taken into account when removing species-specific batch effects. Most batch correction methods are based on linear regression, which fit a linear model describing the batch effect then impute a new expression matrix without the modeled batch effect (Johnson et al., 2007; Risso et al., 2014; Ritchie et al., 2015). This approach is problematic for single-cell RNA-seq data because it assumes an identical population of cell types within each dataset, and a uniform batch-effect across all cell types (Haghverdi et al., 2018; Welch et al., 2019). Single-cell RNA-seq integration methods must be able to delineate between shared and cell type specific differences between species, and account for differences due to sampling method (number of cells/genes observed, or differences due to dissociation protocols between species). In general, these techniques aim to embed cells from both species into a shared lower-dimensional space, within which clusters and cells can be compared.

The first of such integration methods published, mnnCorrect/fastMNN, identifies Mutual Nearest Neighbors (MNNs) in high-dimensional gene expression space to identify cell type specific batch-correction vectors (Haghverdi et al., 2018). MNNs are identified as cells which are mutually closest to each other across datasets (Figure 2A). The difference between the expression profiles for each pair of MNN cells is a vector that represents the biological batch effect, and is used to impute new batch-corrected matrices (dotted lines, Figure 2A; Haghverdi et al., 2018).

The R toolkit Seurat has also incorporated several methods for dataset integration (Butler et al., 2018). The original Seurat alignment procedure involves identifying shared correlation structure across the datasets or species using Canonical Correlation Analysis (CCA) (Figure 2A). CCA identifies groups of genes which have correlated differences in expression. These differences are then used to batch correct each group of genes differently using non-linear dynamic warping, resulting in a shared low-dimensional space (Figure 2A; Berndt and Clifford, 1994). In Seurat v3.0, the authors have incorporated the use of MNNs to aid integration. Following CCA and dynamic time warping, MNNs are identified between datasets and used as “anchors” to compute further correction vectors, similar to mnnCorrect/fastMNN (Haghverdi et al., 2018; Stuart et al., 2019).

One big issue with these approaches is overfitting during integration, resulting in the merging of cell types, or obscuring dataset-specific gene expression differences. The use of MNNs by both Seurat and mnnCorrect/fastMNN reduces this effect when cell types are present in only a subset of the datasets, because they will not have a mutual nearest neighbor in any other dataset. The panoramic stitching algorithms of Scanorama use a more generalized MNN technique, and aim to even further reduce the amount of overfitting between datasets, using a process that is similar to the creation of panoramas from individual images (Hie et al., 2018).

A third method, LIGER, uses integrative non-negative matrix factorization (iNMF) to learn shared and unique gene expression signatures between datasets (Welch et al., 2019). iNMF decomposes one matrix (such as a cell by gene expression matrix) into multiple matrices of basis vectors (cell by factor matrix) and coefficient vectors (factor by gene matrix). Factors represent patterns of gene co-regulation, which typically correspond to groups of genes representing specific cell types. For each dataset LIGER also infers separate factors that correspond to species-specific signals (Figure 2B). Accounting for species-specific factors allows cell types to be identified across datasets, as well as the characterization of genes which contribute to species-specific differences in each cell type (Figure 2B). In addition to species-specific batch effects, both Seurat and LIGER can also integrate data across modalities (protein expression, chromatin modifications, and spatial localization) (Stuart and Satija, 2019; Welch et al., 2019).

Finally, several tools have been developed for computationally efficient integration of either extremely large datasets, or an extensive number of datasets. Harmony corrects analogous cell types from different datasets toward a shared centroid in low-dimensional PCA space, running iteratively until the datasets converge (Figure 2C; Korsunsky et al., 2018). Conos uses a unified graph representation to map cell types across extensive collections of datasets. Spurious connections between datasets are minimized – only cells mapping to each other across multiple datasets are used to identify common subpopulations (Barkas et al., 2018). It will be important in the near future for all of these tools to be benchmarked for different kinds of data, and against each other extensively. I foresee that many of these techniques will be complementary, and that combining approaches will likely be critical for achieving robust performance across many species.

Incorporating Understanding of Transcriptome Evolution Into Single-Cell Comparisons

Though the above approaches offer exciting possibilities for comparing single-cell data across species, many caveats exist for their implementation. All current approaches require that only the orthologous genes between the species are used during analysis. These genes are used during feature selection and PCA (Figure 1A). Non-homologous genes expressed in only one dataset contribute heavily to variation, and can drive cells to cluster with their own species rather than the same cell type across species (Figure 2C; Stuart and Satija, 2019). However, species-specific information may be lost by excluding genes without one-to-one matches, or with one-to-many matches. Indeed, clade-specific genes are known to drive species-specific cell type diversification (Santos et al., 2017; Florio et al., 2018), and sub- or neo-functionalization in expression patterns of one gene copy following gene duplication is common (Figure 2D; Farrè and Albà, 2010).

For closely related species, such as humans and mice, gene symbols can be easily matched to identify orthologs. For more distantly related organisms, databases such as ENSEMBL can be used to identify one-to-one matches (Zerbino et al., 2018). This works well for closely related species, but becomes more difficult as the amount of evolutionary time between species increases, and the relationship between genes becomes less clear (Thornton and Desalle, 2000). Orthology identification has been largely addressed by the field of phylogenomics – to identify species-relationships and to functionally annotate genomes. Many techniques exist for detection of orthology, most of which are based on sequence-similarity and reciprocal BLAST and other methods reviewed elsewhere (Sonnhammer et al., 2014; Nichio et al., 2017). Incorporating measures of gene orthology or sequence similarity into clustering algorithms will be important to avoid reliance on one-to-one homology for understanding gene function.

Recent work has also identified unique evolutionary forces driving transcriptome variation between species (Liang et al., 2018). Groups of genes with similar regulatory logic are thought to evolve in a modular fashion, with transcriptional changes in these genes linked by the transcription factors which control their expression (Arendt et al., 2016). Some of the integration approaches outlined above may already account for such correlated evolutionarily differences in gene expression (LIGER, Seurat). Alternatively, removing the most highly correlated genes during clustering analysis may also be a prudent approach (Liang et al., 2018).

Future Perspectives

The construction of cellular phylogenies should also strive to correctly identify the evolutionary relationships between transcriptionally similar cell types both within and between species. Similarities may result from shared ancestry (homology) or result from convergence onto the same cellular identity (homoplasy). The re-use, re-purposing, or co-option of homologous cellular modules and gene regulatory networks is thought to underlie cell type convergence (Tschopp and Tabin, 2017). Such deep homology not only results in similar cellular functions, but potentially also in highly similar cellular transcriptomes. It may therefore be difficult to disentangle homoplasy from homology using single cell sequencing. Sampling many tissues along larger phylogenies will be necessary to identify where and when specific cell types appear in evolutionary history (Hejnol and Lowe, 2015). From these experiments parsimonious explanations can be developed, providing evidence for homology or homoplasy, and identifying the evolutionary history of specific cellular identities.

Finally, it will be necessary to incorporate phylogenetic comparative methods when comparing differences between species in regard to cell types and gene expression patterns. Biological traits show dependence across species due to the evolutionary history of those species – with more closely related species sharing more similar traits. This should also apply to cell type identities and gene expression patterns (Dunn et al., 2013). Phylogenetic comparative methods account for evolutionary history, modeling trait changes along evolutionary trees, and explicitly take into account their dependence during statistical comparisons (Felsenstein, 2002; Garamszegi, 2014). These have been successfully adapted for bulk transcriptomic data and should be extended to single-cell transcriptomics, where independence of traits is often assumed (Dunn et al., 2013).

Conclusion

Many techniques, tools, and technologies for single-cell sequencing are already applicable for comparisons across species. However, improvement and refinement of current approaches based on evolutionary knowledge should be considered a priority for the field of transcriptomics and evolutionary cell biology. Understanding the evolutionary history and relationships between cells will provide insight into definitions of cell types, and the molecular mechanisms that govern their identities. Using this evolutionary framework, examining the continuum between developmental stage, cell states, and cell types may even elucidate how cell types evolve (Griffith et al., 2018; Arendt et al., 2019). A holistic identification of cell types and their evolutionary origins will require the combination of multiple lines of evidence, not only including molecular identification, but also functional interrogation, and developmental lineage information. Recent approaches have been developed to reconstruct developmental lineage trajectories in silico or using CRISPR barcodes (Briggs et al., 2018; Farrell et al., 2018; Plass et al., 2018; Raj et al., 2018; Wagner et al., 2018; Packer et al., 2019). Incorporating lineage information into evolutionary comparisons will be a difficult, but important task going forward. Such a comprehensive understanding of evolution and cell types will allow us to build cell type phylogenies, and to use them to ask important questions about how cellular changes affect organismal fitness and selection, and how evolution acts on the biological unit of the cell.

Statements

Author contributions

MS conceived and wrote the manuscript.

Funding

This work was supported by a post-doctoral fellowship from the Canadian Institutes of Health Research (CIHR) to MS.

Acknowledgments

I am grateful to the members of the Schier lab (Biozentrum, University of Basel), including A. Schier and B. Raj, and to A. Sawh and D. Dylus for their excellent advice, support, and feedback during the writing of this manuscript.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1
AchimK.ElingN.VergaraH. M.BertucciP. Y.MusserJ.VopalenskyP.et al (2018). Whole-body single-cell sequencing reveals transcriptional domains in the annelid larval body.Mol. Biol. Evol.351047–1062. 10.1093/molbev/msx336
2
AndrewsT. S.HembergM. (2018). Identifying cell populations with scRNASeq.Mol. Aspects Med.59114–122. 10.1016/j.mam.2017.07.002
3
ArendtD.BertucciP. Y.AchimK.MusserJ. M. (2019). Evolution of neuronal types and families.Curr. Opin. Neurobiol.56144–152. 10.1016/J.CONB.2019.01.022
4
ArendtD.MusserJ. M.BakerC. V. H.BergmanA.CepkoC.ErwinD. H.et al (2016). The origin and evolution of cell types.Nat. Rev. Genet.17744–757. 10.1038/nrg.2016.127
5
BacherR.KendziorskiC. (2016). Design and computational analysis of single-cell RNA-sequencing experiments.Genome Biol.17:63. 10.1186/s13059-016-0927-y
6
BarkasN.PetukhovV.NikolaevaD.LozinskyY.DemharterS.KhodosevichK.et al (2018). Wiring together large single-cell RNA-seq sample collections.bioRxiv460246. 10.1101/460246
- CrossRef
- Google Scholar
7
BechtE.McInnesL.HealyJ.DutertreC. A.KwokI. W. H.NgL. G.et al (2019). Dimensionality reduction for visualizing single-cell data using UMAP.Nat. Biotechnol.3738–44. 10.1038/nbt.4314
8
BerndtD.CliffordJ. (1994). “Using dynamic time warping to find patterns in time series,” in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Seattle, WA.
- Google Scholar
9
BreimanL. (2001). Random forrest.Mach. Learn.45:5. 10.1023/A:1010933404324
- CrossRef
- Google Scholar
10
BriggsJ. A.WeinrebC.WagnerD. E.MegasonS.PeshkinL.KirschnerM. W.et al (2018). The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution.Science360:eaar5780. 10.1126/science.aar5780
11
ButlerA.HoffmanP.SmibertP.PapalexiE.SatijaR. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species.Nat. Biotechnol.36411–420. 10.1038/nbt.4096
12
CaoJ.PackerJ. S.RamaniV.CusanovichD. A.HuynhC.DazaR.et al (2017). Comprehensive single-cell transcriptional profiling of a multicellular organism.Science357661–667. 10.1126/science.aam8940
13
DengY.BaoF.DaiQ.WuL. F.AltschulerS. J. (2019). Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning.Nat. Methods16311–314. 10.1038/s41592-019-0353-7
14
DeniskoD.HoffmanM. M. (2018). Classification and interaction in random forests.Proc. Natl. Acad. Sci. U.S.A.1151690–1692. 10.1073/pnas.1800256115
15
DunnC. W.LuoX.WuZ. (2013). Phylogenetic analysis of gene expression.Integr. Comp. Biol.53847–856. 10.1093/icb/ict068
16
FarrèD.AlbàM. M. (2010). Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates.Mol. Biol. Evol.27325–335. 10.1093/molbev/msp242
17
FarrellJ. A.WangY.RiesenfeldS. J.ShekharK.RegevA.SchierA. F. (2018). Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis.Science360:eaar3131. 10.1126/science.aar3131
18
FelsensteinJ. (2002). Phylogenies and the comparative method.Am. Nat.1251–15. 10.1086/284325
- CrossRef
- Google Scholar
19
FlorioM.HeideM.PinsonA.BrandlH.AlbertM.WinklerS.et al (2018). Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex.eLife71–37. 10.7554/eLife.32332
20
GaramszegiL. Z. (2014). Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology.Berlin: Springer.
- Google Scholar
21
GriffithO. W.WagnerG. P.ErkenbrackE. M.MaziarzJ. D.LiangC.ChavanA. R.et al (2018). The mammalian decidual cell evolved from a cellular stress response.PLoS Biol.16:e2005594. 10.1371/journal.pbio.2005594
22
HaberA. L.BitonM.RogelN.HerbstR. H.ShekharK.SmillieC.et al (2017). A single-cell survey of the small intestinal epithelium.Nature551333–339. 10.1038/nature24489
23
HaghverdiL.LunA. T. L.MorganM. D.MarioniJ. C. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.Nat. Biotechnol.36421–427. 10.1038/nbt.4091
24
HejnolA.LoweC. J. (2015). Embracing the comparative approach: how robust phylogenies and broader developmental sampling impacts the understanding of nervous system evolution.Philos. Trans. R. Soc. B Biol. Sci.370:20150045. 10.1098/rstb.2015.0045
25
HieB. L.BrysonB.BergerB. (2018). Panoramic stitching of heterogeneous single-cell transcriptomic data.bioRxiv 371179. 10.1101/371179
- CrossRef
- Google Scholar
26
IslamS.ZeiselA.JoostS.La MannoG.ZajacP.KasperM.et al (2014). Quantitative single-cell RNA-seq with unique molecular identifiers.Nat. Methods11163–166. 10.1038/nmeth.2772
27
JaitinD. A.KenigsbergE.Keren-ShaulH.ElefantN.PaulF.ZaretskyI.et al (2014). Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types.Science343776–779. 10.1126/science.1247651
28
JohnsonM. B.SunX.KodaniA.Borges-MonroyR.GirskisK. M.RyuS. C.et al (2018). Aspm knockout ferret reveals an evolutionary mechanism governing cerebral cortical size letter.Nature556370–375. 10.1038/s41586-018-0035-0
29
JohnsonW. E.LiC.RabinovicA. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods.Biostatistics8118–127. 10.1093/biostatistics/kxj037
30
Keren-ShaulH.SpinradA.WeinerA.Matcovitch-NatanO.Dvir-SzternfeldR.UllandT. K.et al (2017). A unique microglia type associated with restricting development of Alzheimer’s disease.Cell1691276.e17–1290.e17. 10.1016/j.cell.2017.05.018
31
KinK. (2015). Inferring cell type innovations by phylogenetic methods-concepts, methods, and limitations.J. Exp. Zool. Part B Mol. Dev. Evol.324653–661. 10.1002/jez.b.22657
32
KleinA. M.MazutisL.AkartunaI.TallapragadaN.VeresA.LiV.et al (2015). Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.Cell1611187–1201. 10.1016/j.cell.2015.04.044
33
KorsunskyI.FanJ.SlowikowskiK.ZhangF.WeiK.BaglaenkoY.et al (2018). Fast, sensitive, and accurate integration of single cell data with Harmony.bioRxiv461954. 10.1101/461954
- CrossRef
- Google Scholar
34
Kryuchkova-MostacciN.Robinson-RechaviM. (2016). Tissue-specificity of gene expression diverges slowly between orthologs, and rapidly between paralogs.PLoS Comput. Biol.12:e1005274. 10.1371/journal.pcbi.1005274
35
La MannoG.GyllborgD.CodeluppiS.NishimuraK.SaltoC.ZeiselA.et al (2016). Molecular diversity of midbrain development in mouse, human, and stem cells.Cell167566.e19–580.e19. 10.1016/j.cell.2016.09.027
36
LiangC.MusserJ. M.CloutierA.PrumR. O.WagnerG. P. (2018). Pervasive correlated evolution in gene expression shapes cell and tissue type transcriptomes.Genome Biol. Evol.10538–552. 10.1093/gbe/evy016
37
MacoskoE. Z.BasuA.SatijaR.NemeshJ.ShekharK.GoldmanM.et al (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.Cell1611202–1214. 10.1016/j.cell.2015.05.002
38
MarioniJ. C.ArendtD. (2017). How single-cell genomics is changing evolutionary and developmental biology.Annu. Rev. Cell Dev. Biol.33537–553. 10.1146/annurev-cellbio-100616-060818
39
MolnarZ.MarguliesE. H.WangW. Z.Garcia-MorenoF.MontielJ. F.BelgardT. G.et al (2013). Adult pallium transcriptomes surprise in not reflecting predicted homologies across diverse chicken and mouse pallial sectors.Proc. Natl. Acad. Sci. U.S.A.11013150–13155. 10.1073/pnas.1307444110
40
MontoroD. T.HaberA. L.BitonM.VinarskyV.LinB.BirketS. E.et al (2018). A revised airway epithelial hierarchy includes CFTR-expressing ionocytes.Nature560319–324. 10.1038/s41586-018-0393-7
41
MoussaM.MãndoiuI. I. (2018). Single cell RNA-seq data clustering using TF-IDF based methods.BMC Genomics19(Suppl. 6):569. 10.1186/s12864-018-4922-4
42
NichioB. T. L.MarchaukoskiJ. N.RaittzR. T. (2017). New tools in orthology analysis: a brief review of promising perspectives.Front. Genet.8:165. 10.3389/fgene.2017.00165
43
PackerJ. S.ZhuQ.HuynhC.SivaramakrishnanP.PrestonE.DueckH.et al (2019). A lineage-resolved molecular atlas of C. elegans embryogenesis at single cell resolution.bioRxiv565549. 10.1101/565549
44
PandeyS.ShekharK.RegevA.SchierA. F. (2018). Comprehensive identification and spatial mapping of habenular neuronal types using single-cell RNA-Seq.Curr. Biol.281052.e7–1065.e7. 10.1016/j.cub.2018.02.040
45
PaolilloC.LondinE.FortinaP. (2019). Single-cell genomics.Clin. Chem.65972–985. 10.1373/clinchem.2017.283895
46
PicelliS.FaridaniO. R.BjörklundÅK.WinbergG.SagasserS.SandbergR. (2014). Full-length RNA-seq from single cells using Smart-seq2.Nat. Protoc.9171–181. 10.1038/nprot.2014.006
47
PlassM.SolanaJ.Alexander WolfF.AyoubS.MisiosA.GlažarP.et al (2018). Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics.Science360:eaaq1723. 10.1126/science.aaq1723
48
PlasschaertL. W.ŽilionisR.Choo-WingR.SavovaV.KnehrJ.RomaG.et al (2018). A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte.Nature.560377–381. 10.1038/s41586-018-0394-6
49
PollenA. A.BhaduriA.AndrewsM. G.NowakowskiT. J.MeyersonO. S.Mostajo-RadjiM. A.et al (2019). Establishing cerebral organoids as models of human-specific brain evolution.Cell176743.e17–756.e17. 10.1016/j.cell.2019.01.017
50
PollenA. A.NowakowskiT. J.ChenJ.RetallackH.Sandoval-EspinosaC.NicholasC. R.et al (2015). Molecular identity of human outer radial glia during cortical development.Cell16355–67. 10.1016/j.cell.2015.09.004
51
RajB.WagnerD. E.McKennaA.PandeyS.KleinA. M.ShendureJ.et al (2018). Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain.Nat. Biotechnol.36442–450. 10.1038/nbt.4103
52
RissoD.NgaiJ.SpeedT. P.DudoitS. (2014). Normalization of RNA-seq data using factor analysis of control genes or samples.Nat. Biotechnol.32896–902. 10.1038/nbt.2931
53
RitchieM. E.PhipsonB.WuD.HuY.LawC. W.ShiW.et al (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Res.43:e47. 10.1093/nar/gkv007
54
RyuK. H.HuangL.KangH. M.SchiefelbeinJ. (2019). Single-cell RNA sequencing resolves molecular relationships among individual plant cells.Plant Physiol.1791444–1456. 10.1104/pp.18.01482
55
SantosM. E.Le BouquinA.CrumiereA. J. J.KhilaA. (2017). Taxon-restricted genes at the origin of a novel trait allowing access to a new environment.Science358386–390. 10.1126/science.aan2748
56
Sebé-PedrósA.ChomskyE.PangK.Lara-AstiasoD.GaitiF.MukamelZ.et al (2018a). Early metazoan cell type diversity and the evolution of multicellular gene regulation.Nat. Ecol. Evol.21176–1188. 10.1038/s41559-018-0575-6
57
Sebé-PedrósA.SaudemontB.ChomskyE.PlessierF.MailhéM. P.RennoJ.et al (2018b). Cnidarian cell type diversity and regulation revealed by whole-organism single-cell RNA-Seq.Cell1731520.e20–1534.e20. 10.1016/j.cell.2018.05.019
58
ShekharK.LapanS. W.WhitneyI. E.TranN. M.MacoskoE. Z.KowalczykM.et al (2016). Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics.Cell1661308.e30–1323.e30. 10.1016/j.cell.2016.07.054
59
SiebertS.FarrellJ. A.CazetJ. F.AbeykoonY.PrimackA. S.SchnitzlerC. E.et al (2018). Stem cell differentiation trajectories in Hydra resolved at single-cell resolution.bioRxiv460154. 10.1101/460154
60
SonnhammerE. L. L.GabaldonT.Sousa Da SilvaA. W.MartinM.Robinson-RechaviM.BoeckmannB.et al (2014). Big data and other challenges in the quest for orthologs.Bioinformatics302993–2998. 10.1093/bioinformatics/btu492
61
SoumillonM.CacchiarelliD.SemrauS.van OudenaardenA.MikkelsenT. S. (2014). Characterization of directed differentiation by high-throughput single-cell RNA-Seq.bioRxiv3236. 10.1101/003236
- CrossRef
- Google Scholar
62
StuartT.ButlerA.HoffmanP.HafemeisterC.PapalexiE.MauckW. M.et al (2019). Comprehensive integration of single-cell data.Cell1771888.e21–1902.e21. 10.1016/J.CELL.2019.05.031
63
StuartT.SatijaR. (2019). Integrative single-cell analysis.Nat. Rev. Genet.20257–272. 10.1038/s41576-019-0093-7
64
SvenssonV.Vento-TormoR.TeichmannS. A. (2018). Exponential scaling of single-cell RNA-seq in the past decade.Nat. Protoc.13599–604. 10.1038/nprot.2017.149
65
ThorntonJ. W.DesalleR. (2000). Gene family evolution and homology: genomics meets phylogenetics.Annu. Rev. Genomics Hum. Genet.141–73. 10.1146/annurev.genom.1.1.41
66
ToschesM. A.YamawakiT. M.NaumannR. K.JacobiA. A.TushevG.LaurentG. (2018). Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles.Science360881–888. 10.1126/science.aar4237
67
TschoppP.TabinC. J. (2017). Deep homology in the age of next-generation sequencing.Philos. Trans. R. Soc. B Biol. Sci.325:20150475. 10.1098/rstb.2015.0475
68
UnderwoodJ. G.MontesclarosL.MikkelsenT. S.EricsonN. G.Schnall-LevinM.BharadwajR.et al (2017). Massively parallel digital transcriptional profiling of single cells.Nat. Commun.8:14049. 10.1038/ncomms14049
69
van der MaatenL.HintonG. (2008). Visualizing data using t-SNE.J. Mach. Learn. Res.92579—-2605. 10.1007/s10479-011-0841-3
- CrossRef
- Google Scholar
70
WagnerD. E.WeinrebC.CollinsZ. M.BriggsJ. A.MegasonS. G.KleinA. M. (2018). Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo.Science360981–987. 10.1126/science.aar4362
71
WelchJ. D.KozarevaV.FerreiraA.VanderburgC.MartinC.MacoskoE. Z. (2019). Single-cell multi-omic integration compares and contrasts features of brain cell identity.Cell1771873.e17–1887.e17. 10.1016/J.CELL.2019.05.006
72
YipS. H.ShamP. C.WangJ. (2018). Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.Brief. Bioinform.10.1093/bib/bby011[Epub ahead of print].
73
ZeiselA.HochgernerH.LönnerbergP.JohnssonA.MemicF.van der ZwanJ.et al (2018). Molecular architecture of the mouse nervous system.Cell174999.e22–1014.e22. 10.1016/j.cell.2018.06.021
74
ZengH.SanesJ. R. (2017). Neuronal cell-type classification: challenges, opportunities and the path forward.Nat. Rev. Neurosci.18530–546. 10.1038/nrn.2017.85
75
ZerbinoD. R.AchuthanP.AkanniW.AmodeM. R.BarrellD.BhaiJ.et al (2018). Ensembl 2018.Nucleic Acids Res.46D754–D761. 10.1093/nar/gkx1098
76
ZhengG. X. Y.TerryJ. M.BelgraderP.RyvkinP.BentZ. W.WilsonR.et al (2017). Massively parallel digital transcriptional profiling of single cells.Nat. Commun.8:14049. 10.1038/ncomms14049

Summary

Keywords

evolutionary cell biology, single-cell RNA sequencing, transcriptome evolution, species comparisons, cell types

Citation

Shafer MER (2019) Cross-Species Analysis of Single-Cell Transcriptomic Data. Front. Cell Dev. Biol. 7:175. doi: 10.3389/fcell.2019.00175

Received

31 March 2019

Accepted

12 August 2019

Published

02 September 2019

Volume

7 - 2019

Edited by

Andreas Hejnol, University of Bergen, Norway

Reviewed by

Jordi Solana, Oxford Brookes University, United Kingdom; Eve Gazave, UMR7592 Institut Jacques Monod (IJM), France

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Maxwell E. R. Shafer, max.shafer@gmail.com; maxwell.shafer@unibas.ch

This article was submitted to Evolutionary Developmental Biology, a section of the journal Frontiers in Cell and Developmental Biology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Evolutionary Developmental Biology

PERSPECTIVE article

Cross-Species Analysis of Single-Cell Transcriptomic Data

Abstract

Introduction

Single-Cell Sequencing and Single-Cell Clustering Approaches

Accounting for Experimental and Biological Batch Effects

Comparing Cell Types Across Species

Computational Integration of Single-Cell Datasets

Incorporating Understanding of Transcriptome Evolution Into Single-Cell Comparisons

Future Perspectives

Conclusion

Statements

Author contributions

Funding

Acknowledgments

Conflict of interest

References

Summary

Outline

Figures

Cite article

Article metrics

PERSPECTIVE article

Cross-Species Analysis of Single-Cell Transcriptomic Data

Abstract

Introduction

Single-Cell Sequencing and Single-Cell Clustering Approaches

Accounting for Experimental and Biological Batch Effects

Comparing Cell Types Across Species

Computational Integration of Single-Cell Datasets

Incorporating Understanding of Transcriptome Evolution Into Single-Cell Comparisons

Future Perspectives

Conclusion

Statements

Author contributions

Funding

Acknowledgments

Conflict of interest

References

Summary

Outline

Figures

Cite article

Share article

Article metrics