Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

Chen, Geng; Ning, Baitang; Shi, Tieliu

doi:10.3389/fgene.2019.00317

REVIEW article

Front. Genet., 05 April 2019

Sec. Computational Genomics

Volume 10 - 2019 | https://doi.org/10.3389/fgene.2019.00317

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

1. Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
2. National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, AR, United States

Article metrics

View details

915

Citations

194,6k

Views

49,7k

Downloads

Abstract

Single-cell RNA sequencing (scRNA-seq) technologies allow the dissection of gene expression at single-cell resolution, which greatly revolutionizes transcriptomic studies. A number of scRNA-seq protocols have been developed, and these methods possess their unique features with distinct advantages and disadvantages. Due to technical limitations and biological factors, scRNA-seq data are noisier and more complex than bulk RNA-seq data. The high variability of scRNA-seq data raises computational challenges in data analysis. Although an increasing number of bioinformatics methods are proposed for analyzing and interpreting scRNA-seq data, novel algorithms are required to ensure the accuracy and reproducibility of results. In this review, we provide an overview of currently available single-cell isolation protocols and scRNA-seq technologies, and discuss the methods for diverse scRNA-seq data analyses including quality control, read mapping, gene expression quantification, batch effect correction, normalization, imputation, dimensionality reduction, feature selection, cell clustering, trajectory inference, differential expression calling, alternative splicing, allelic expression, and gene regulatory network reconstruction. Further, we outline the prospective development and applications of scRNA-seq technologies.

Introduction

Bulk RNA-seq technologies have been widely used to study gene expression patterns at population level in the past decade. The advent of single-cell RNA sequencing (scRNA-seq) provides unprecedented opportunities for exploring gene expression profile at the single-cell level. Currently, scRNA-seq has become a favorable choice for studying the key biological questions of cell heterogeneity and the development of early embryos (only include a few number of cells), since bulk RNA-seq mainly reflects the averaged gene expression across thousands of cells. In recent years, scRNA-seq has been applied to various species, especially to diverse human tissues (including normal and cancer), and these studies revealed meaningful cell-to-cell gene expression variability (Jaitin et al., 2014; Grun et al., 2015; Chen et al., 2016a; Cao et al., 2017; Rosenberg et al., 2018). With the innovation of sequencing technologies, some different scRNA-seq protocols have been proposed in the past few years, which largely facilitated the understanding of dynamic gene expression at single-cell resolution (Kolodziejczyk et al., 2015; Haque et al., 2017; Picelli, 2017; Chen et al., 2018). One of them is the highly efficient strategy LCM-seq (Nichterwitz et al., 2016) which combines laser capture microscopy (LCM) and Smart-seq2 (Picelli et al., 2013) for single-cell transcriptomics without tissue dissociation. Currently available scRNA-seq protocols can be mainly split into two categories based on the captured transcript coverage: (i) full-length transcript sequencing approaches [such as Smart-seq2 (Picelli et al., 2013), MATQ-seq (Sheng et al., 2017) and SUPeR-seq (Fan X. et al., 2015)]; and (ii) 3′-end [e.g., Drop-seq (Macosko et al., 2015), Seq-Well (Gierahn et al., 2017), Chromium (Zheng et al., 2017), and DroNC-seq (Habib et al., 2017)] or 5′-end [such as STRT-seq (Islam et al., 2011, 2012)] transcript sequencing technologies. Each scRNA-seq protocol has its own benefits and drawbacks, resulting in that different scRNA-seq approaches have distinct features and disparate performances (Ziegenhain et al., 2017). In conducting single-cell transcriptomic study, specific scRNA-seq technology may need to be employed in consideration of the balance between research goal and sequencing cost.

Owing to the low amount of starting material, scRNA-seq has limitations of low capture efficiency and high dropouts (Haque et al., 2017). Compared to bulk RNA-seq, scRNA-seq produces nosier and more variable data. The technical noise and biological variation (e.g., stochastic transcription) raise substantial challenges for computational analysis of scRNA-seq data. A variety of tools have been designed to conducting diverse bulk RNA-seq data analyses, but many of those methods cannot be directly applied to scRNA-seq data (Stegle et al., 2015). Except short-read mapping, almost all data analyses (such as differential expression, cell clustering, and gene regulatory network inference) have certain disparities in methods between scRNA-seq and bulk RNA-seq. Due to the high technical noise, quality control (QC) is crucial for identifying and removing the low-quality scRNA-seq data to get robust and reproducible results. Furthermore, some analyses including alternative splicing (AS) detection, allelic expression exploration and RNA-editing identification are not suitable for the 3′ or 5′-tag sequencing protocols of scRNA-seq, but these analyses could be applicable to the data generated by whole-transcript scRNA-seq. On the other hand, an increasing number of tools are specially proposed for analyzing scRNA-seq data, and each method has its own pros and cons (Stegle et al., 2015; Bacher and Kendziorski, 2016). Therefore, to effectively handle the high variability of scRNA-seq data, attention should be paid to choosing appropriately analytical approaches.

This Review aims to summarize and discuss currently available scRNA-seq technologies and various data analysis methods. We first introduce distinct single-cell isolation protocols and various scRNA-seq technologies developed in recent years. Then we focus on the analyses of scRNA-seq data and highlight the analytical differences between bulk RNA-seq and scRNA-seq data. Considering the high technical noise and complexity of scRNA-seq data, we also provide recommendations on the selection of suitable tools to analyze scRNA-seq data and ensure the reproducibility of results.

Isolation of Single Cells

The first step of scRNA-seq is isolation of individual cells (Figure 1), although the capture efficiency is a big challenge for scRNA-seq. Currently, several different approaches are available for isolating single cells, including limiting dilution, micromanipulation, flow-activated cell sorting (FACS), laser capture microdissection (LCM), and microfluidics (Gross et al., 2015; Kolodziejczyk et al., 2015; Hwang et al., 2018). Limiting dilution technique uses pipettes to isolate cells by dilution, the main limitation of this method is inefficient. Micromanipulation is a classical approach used to retrieve cells from samples with a small number of cells, such as early embryos or uncultivated microorganisms, while this technique is time-consuming and low throughput. FACS has been widely used for isolating single cells, which requires large starting volumes (>10,000 cells) in suspension. LCM is an advanced strategy used for isolating individual cells from solid tissues by using a laser system aided by computer. Microfluidics is increasingly popular due to its property of low sample consumption, precise fluid control and low analysis cost. These single-cell isolation protocols have their own advantages and show distinct performances in terms of capture efficiency and purity of the target cells (Gross et al., 2015; Hu et al., 2016).

FIGURE 1

Currently Available ScRNA-Seq Technologies

To date, a number of scRNA-seq technologies have been proposed for single-cell transcriptomic studies (Table 1). The first scRNA-seq method was published by Tang et al. (2009), and then many other scRNA-seq approaches were subsequently developed. Those scRNA-seq technologies differ in at least one of the following aspects: (i) cell isolation; (ii) cell lysis; (iii) reverse transcription; (iv) amplification; (v) transcript coverage; (vi) strand specificity; and (vii) UMI (unique molecular identifiers, molecular tags that can be applied to detect and quantify the unique transcripts) availability. One conspicuous difference among these scRNA-seq methods is that some of them can produce full-length (or nearly full-length) transcript sequencing data (e.g., Smart-seq2, SUPeR-seq, and MATQ-seq), whereas others only capture and sequence the 3′-end [such as Drop-seq, Seq-Well and DroNC-seq, SPLiT-seq (Rosenberg et al., 2018)] or 5′-end (e.g., STRT-seq) of the transcripts (Table 1). Distinct scRNA-seq protocols may possess disparate strengths and weaknesses, and several published reviews have compared a portion of them in detail (Kolodziejczyk et al., 2015; Haque et al., 2017; Picelli, 2017; Ziegenhain et al., 2017). A previous study demonstrated that Smart-seq2 can detect a bigger number of expressed genes than other scRNA-seq technologies including CEL-seq2 (Hashimshony et al., 2016), MARS-seq (Jaitin et al., 2014), Smart-seq (Ramskold et al., 2012), and Drop-seq protocols (Ziegenhain et al., 2017). Recently, Sheng et al. (2017) showed that another full-length transcript sequencing approach MATQ-seq could outperform Smart-seq2 in detecting low-abundance genes.

Table 1

Methods	Transcript coverage	UMI possibility	Strand specific	References
Tang method	Nearly full-length	No	No	Tang et al., 2009
Quartz-Seq	Full-length	No	No	Sasagawa et al., 2013
SUPeR-seq	Full-length	No	No	Fan X. et al., 2015
Smart-seq	Full-length	No	No	Ramskold et al., 2012
Smart-seq2	Full-length	No	No	Picelli et al., 2013
MATQ-seq	Full-length	Yes	Yes	Sheng et al., 2017
STRT-seq and STRT/C1	5′-only	Yes	Yes	Islam et al., 2011, 2012
CEL-seq	3′-only	Yes	Yes	Hashimshony et al., 2012
CEL-seq2	3′-only	Yes	Yes	Hashimshony et al., 2016
MARS-seq	3′-only	Yes	Yes	Jaitin et al., 2014
CytoSeq	3′-only	Yes	Yes	Fan H.C. et al., 2015
Drop-seq	3′-only	Yes	Yes	Macosko et al., 2015
InDrop	3′-only	Yes	Yes	Klein et al., 2015
Chromium	3′-only	Yes	Yes	Zheng et al., 2017
SPLiT-seq	3′-only	Yes	Yes	Rosenberg et al., 2018
sci-RNA-seq	3′-only	Yes	Yes	Cao et al., 2017
Seq-Well	3′-only	Yes	Yes	Gierahn et al., 2017
DroNC-seq	3′-only	Yes	Yes	Habib et al., 2017
Quartz-Seq2	3′-only	Yes	Yes	Sasagawa et al., 2018

Summary of widely used scRNA-seq technologies.

Compared to 3′-end or 5′-end counting protocols, full-length scRNA-seq methods have incomparable advantages in isoform usage analysis, allelic expression detection, and RNA editing identification owing to their superiority of transcript coverage. Moreover, for detecting certain lowly expressed genes/transcripts, full-length scRNA-seq approaches could be better than 3′ sequencing methods (Ziegenhain et al., 2017). Notably, droplet-based technologies [e.g., Drop-seq (Macosko et al., 2015), InDrop (Klein et al., 2015), and Chromium (Zheng et al., 2017)] can generally provide a lager throughput of cells and a lower sequencing cost per cell compared to whole-transcript scRNA-seq. Thus, droplet-based protocols are suitable for generating huge amounts of cells to identify the cell subpopulations of complex tissues or tumor samples.

Strikingly, several scRNA-seq technologies can capture both polyA+ and polyA- RNAs, such as SUPeR-seq (Fan X. et al., 2015) and MATQ-seq (Sheng et al., 2017). These protocols are extremely useful for sequencing long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs). Lots of studies have demonstrated that lncRNAs and circRNAs play important roles in diverse biological processes of cells and may serve as crucial biomarkers for cancers (Barrett and Salzman, 2016; Chen et al., 2016b; Quinn and Chang, 2016; Kristensen et al., 2018); therefore, such scRNA-seq methods can provide unprecedented opportunities to comprehensively explore the expression dynamics of both protein-coding and noncoding RNAs at the single-cell level.

Compared to traditional bulk RNA-seq technologies, scRNA-seq protocols suffer higher technical variations. In order to estimate the technical variances among different cells, spike-ins [such as External RNA Control Consortium (ERCC) controls (External, 2005)] and UMIs have been widely used in corresponding scRNA-seq methods. The RNA spike-ins are RNA transcripts (with known sequences and quantity) that are applied to calibrate the measurements of RNA hybridization assays, such as RNA-Seq, and UMIs can theoretically enable the estimation of absolute molecular counts. It is worth noting that ERCC and UMIs are not applicable to all scRNA-seq technologies due to the inherent protocol differences. Spike-ins are used in approaches like Smart-seq2 and SUPeR-seq but are not compatible with droplet-based methods, whereas UMIs are typically applied to 3′-end sequencing technologies [such as Drop-seq (Macosko et al., 2015), InDrop (Klein et al., 2015), and MARS-seq (Jaitin et al., 2014)]. Consequently, users can select the suitable scRNA-seq method according to the technical properties and advantages, number of cells to be sequenced and cost considerations.

Read Alignment and Expression Quantification of ScRNA-Seq Data

The mapping ratio of reads is an important indicator of the overall quality of scRNA-seq data. Since both scRNA-seq and bulk RNA-seq technologies generally sequence transcripts into reads to generate the raw data in fastq format, no differences exist between these two types of RNA-seq data in read alignment. The mapping tools originally developed for bulk RNA-seq are also applicable to scRNA-seq data. Numerous spliced alignment programs have been designed for mapping RNA-seq data, which was extensively discussed previously (Li and Homer, 2010; Chen et al., 2011). Generally, the read mapping algorithms mainly fall into two categories: spaced-seed indexing based and Burrows-Wheeler transform (BWT) based (Li and Homer, 2010). Currently popular aligners like TopHat2 (Kim et al., 2013), STAR (Dobin and Gingeras, 2015), and HISAT (Kim et al., 2015) perform well in mapping speed and accuracy, and they can efficiently map billions of reads to the reference genome or transcriptome (Table 2). STAR is a suffix-array based method and is faster than TopHat2, but it requires a huge memory size (28 gigabytes for human genome) for read mapping (Dobin and Gingeras, 2015). Engstrom et al. systematically evaluated 26 read alignment protocols (did not include HISAT) and found that different mapping tools exhibit distinct strengths and weakness, where some programs are with a faster mapping speed but a lower accuracy in splice junction detection (Engstrom et al., 2013). HISAT is developed based on BWT and Ferragina-Manzini (FM) methods. Kim et al. (2015) showed that HISAT is currently the fastest tool that can achieve equal or better accuracy than other available aligners.

Table 2

Tools	Category	URL	References
TopHat2	Read mapping	https://ccb.jhu.edu/software/tophat/index.shtml	Kim et al., 2013
STAR	Read mapping	https://github.com/alexdobin/STAR	Dobin and Gingeras, 2015
HISAT2	Read mapping	https://ccb.jhu.edu/software/hisat2/index.shtml	Kim et al., 2015
Cufflinks	Expression quantification	https://github.com/cole-trapnell-lab/cufflinks	Trapnell et al., 2010
RSEM	Expression quantification	https://github.com/deweylab/RSEM	Li and Dewey, 2011
StringTie	Expression quantification	https://github.com/gpertea/stringtie	Pertea et al., 2015

Tools for read mapping and expression quantification of scRNA-seq data.

For gene/transcript expression quantification, distinct approaches are needed, based on the range of transcript sequence captured by scRNA-seq. The data generated by whole-transcript scRNA-seq methods (such as Smart-seq2 and MATQ-seq) can be analyzed with the software developed for bulk RNA-seq to quantify gene/transcript expression. Two main approaches are available for transcriptome reconstruction: de novo assembly (does not need a reference genome) and reference-based or genome-guided assembly (Chen et al., 2017b). De novo transcriptome assembly methods are primarily applied to the organisms that lack a reference genome, and are generally with a lower accuracy than that of genome-guided assembly (Garber et al., 2011). The popular genome-guided assembly tools including Cufflinks (Trapnell et al., 2010), RSEM (Li and Dewey, 2011), and Stringtie (Pertea et al., 2015) have been broadly used in many scRNA-seq studies to get relative gene/transcript expression estimation in reads or fragments per kilobase per million mapped reads (RPKM or FPKM) or transcripts per million mapped reads (TPM) (Table 2). Pertea et al. (2015) stated that StringTie outperforms other genome-guided approaches in gene/transcript reconstruction and expression quantification. On the other hand, for the 3′-end scRNA-seq protocols (e.g., CEL-seq2, MARS-seq, Drop-seq, and InDrop), specific algorithms are required to calculate gene/transcript expression based on UMIs. SAVER (single-cell analysis via expression recovery) is an efficient UMI-based tool recently proposed for accurately estimating gene expression of single cells (Huang et al., 2018). In theory, UMI-based scRNA-seq can largely reduce the technical noise, which remarkably benefits the estimation of absolute transcript counts (Islam et al., 2014).

Quality Control of ScRNA-Seq Data

The limitations in scRNA-seq including bias of transcript coverage, low capture efficiency, and sequencing coverage result in that scRNA-seq data are with a higher level of technical noise than bulk RNA-seq data (Kolodziejczyk et al., 2015). Even for the most sensitive scRNA-seq protocol, it is a frequent phenomenon that some specific transcripts cannot be detected (termed dropout events) (Haque et al., 2017). Generally, scRNA-seq experiments can generate a portion of low-quality data from the cells that are broken or dead or mixed with multiple cells (Ilicic et al., 2016). These low-quality cells will hinder the downstream analysis and may lead to misinterpretation of the data. Accordingly, QC of scRNA-seq data is crucial to identify and remove the low-quality cells.

To exclude the low-quality cells from scRNA-seq, close attention should be paid to avoid multi-cells or dead cells in the cell capture step. After sequencing, a series of QC analyses are required to eliminate the data from low-quality cells. Those samples contain only a few number of reads should be discarded first since insufficient sequencing depth may lead to the loss of a large portion of lowly and moderately expressed genes. Then tools initially developed for QC of bulk RNA-seq data, such as FastQC¹, can be employed to check the sequencing quality of scRNA-seq data. Moreover, after read alignment, samples with very low mapping ratio should be eliminated because they contain massively unmappable reads that might be resulted from RNA degradation. If extrinsic spike-ins (such ERCC) were used in scRNA-seq, technical noise could be estimated. The cells with an extremely high portion of reads mapped to the spike-ins indicate that they were probably broken during cell capture process and should be removed (Ilicic et al., 2016). Cytoplasmic RNAs are usually lost but mitochondrial RNAs are retained for broken cells, thus the ratio of reads mapped to mitochondrial genome is also informative for identifying low-quality cells (Bacher and Kendziorski, 2016). Additionally, the number of expressed genes/transcripts can be detected in each cell is also suggestive. If only a small number of genes can be detected in a cell, this cell is probably damaged or dead or suffered from RNA degradation. Considering the high noise of scRNA-seq data, a threshold of 1 FPKM/RPKM was usually applied to define the expressed genes. Some QC methods for scRNA-seq have been proposed (Stegle et al., 2015; Ilicic et al., 2016), including SinQC (Jiang et al., 2016) and Scater (McCarthy et al., 2017), these tools are useful for QC of scRNA-seq data.

Batch Effect Correction

Batch effect is a common source of technical variation in high-throughput sequencing experiments. The innovation and decreasing cost of scRNA-seq enable many studies to profile the transcriptomes of a huge amount of cells. The large scale scRNA-seq data sets might be separately generated with distinct operators at different times, and could also be produced in multiple laboratories using disparate cell dissociation protocols, library preparation approaches and/or sequencing platforms. These factors would introduce systematic error and confound the technical and biological variability, leading to that the gene expression profile in one batch systematically differs from that in another (Leek et al., 2010; Hicks et al., 2018). Therefore, batch effect is a major challenge in scRNA-seq data analysis, which may mask the underlying biology and cause spurious results. To avoid incorrect data integration and interpretation, batch effects must be corrected before the downstream analysis. Because of the data feature differences between scRNA-seq and bulk RNA-seq, batch-correction approaches specially proposed for bulk RNA-seq [e.g., RUVseq (Risso et al., 2014) and svaseq (Leek, 2014)] may not be suitable for scRNA-seq. Several methods have been recently designed to mitigate the batch effects in scRNA-seq data, such as MNN (mutual nearest neighbor) (Haghverdi et al., 2018) and kBET (k-nearest neighbor batch effect test) (Buttner et al., 2019). MNN corrects the batch effects using the data from the most similar cells in different batches. KBET is a χ²-based method for quantifying batch effects in scRNA-seq data. These specific batch-correction approaches for scRNA-seq data can perform better than the methods developed for bulk RNA-seq (Haghverdi et al., 2018; Buttner et al., 2019).

Normalization of ScRNA-Seq Data

To correctly interpret the results from scRNA-seq data, normalization is an essential step to get the signal of interest by adjusting unwanted biases resulted from capture efficiency, sequencing depth, dropouts, and other technical effects. Technical noise of scRNA-seq is an obvious problem due to the low starting material and challenging experimental protocols. Normalization of scRNA-seq data will benefit the downstream analyses including cell subpopulation identification and differential expression calling. In general, normalization can be divided into two different types: within-sample normalization and between-sample normalization (Vallejos et al., 2017). Within-sample normalization aims to remove the gene-specific biases (e.g., GC content and gene length), which makes gene expression comparable within one sample (such as RPKM/FPKM and TPM). In contrast, between-sample normalization is to adjust sample-specific differences (e.g., sequencing depth and capture efficiency) to enable the comparison of gene expression between samples. Generally, those simple normalization strategies are based on sequencing depth or upper quartile. If spike-ins or UMIs are used in scRNA-seq protocol, normalization can be refined based on the performance of spike-ins/UMIs (Bacher and Kendziorski, 2016).

A number of approaches have been developed for between-sample normalization of bulk RNA-seq data, such as DESeq2 (Love et al., 2014) and trimmed mean of M values (TMM) (Robinson and Oshlack, 2010). DEseq2 calculates scaling factor based on the read counts across different samples, while TMM removes the extreme log fold changes (Vallejos et al., 2017). However, bulk-based normalization approaches may be not suitable for the data of single-cell transcriptomics. Because scRNA-seq generates abundant zero-expression values and has a higher level of technical variation than bulk RNA-seq, using bulk RNA-seq normalization approaches may cause overcorrection in scRNA-seq for lowly expressed genes (Vallejos et al., 2017). Several normalization methods have been proposed for scRNA-seq data, such as SCnorm (Bacher et al., 2017), SAMstrt (Katayama et al., 2013) and a recently introduced deconvolution approach that uses the summed expression values across pools of cells to conduct normalization (Lun et al., 2016). SCnorm is based on quantile regression, while SAMstrt relies on spike-ins. Bacher et al. (2017) believed that traditional normalization methods developed for bulk RNA-seq may introduce artifacts for normalizing scRNA-seq data, while SCnorm can effectively normalize scRNA-seq data and improve principal component analysis (PCA) and the identification of differentially expressed genes.

Imputation of ScRNA-Seq Data

Single-cell RNA sequencing data generally contain many missing values or dropouts that were caused by failed amplification of the original RNAs. The frequency of dropout events for scRNA-seq is protocol-dependent, and is closely associated with the number of sequencing reads generated for each cell (Svensson et al., 2017). The dropout events increase the cell-to-cell variability, leading to signal influence on every gene, and obscuration of gene-gene relationship detection. Therefore, dropouts can largely affect the downstream analyses since a significant portion of truly expressed transcripts may not be detectable in scRNA-seq. Imputation is a useful strategy to replace the missing data (dropouts) with substituted values. Although some methods have been proposed for imputation of bulk RNA-seq data, they are not directly applicable to scRNA-seq data (Zhang and Zhang, 2018). Several imputation methods have been recently developed for scRNA-seq, including SAVER (Huang et al., 2018), MAGIC (van Dijk et al., 2018), ScImpute (Li and Li, 2018), DrImpute (Gong et al., 2018), and AutoImpute (Talwar et al., 2018). SAVER is a Bayesian-based model designed for UMI-based scRNA-seq data to recover the true expression level of all genes. MAGIC imputes the gene expression by building Markov affinity-based graph. The developers of ScImpute suggested that SAVER and MAGIC may lead to expression changes of the genes unaffected by dropouts, while ScImpute can impute the dropout values without introducing new biases through using the information from the same genes unlikely affected by dropouts in other similar cells. DrImpute is a clustering-based approach and can effectively separate the dropout zeros from true zeros. AutoImpute is an autoencoder-based method that learns the inherent distribution of scRNA-seq data to impute the missing values. Recently, Zhang et al. evaluated different imputation methods and found that the performances of these approaches are correlated with their model hypothesis and scalability (Zhang and Zhang, 2018).

Dimensionality Reduction and Feature Selection

Single-cell RNA sequencing data are with a high dimensionality, which may involve thousands of genes and a large number of cells. Dimensionality reduction and feature selection are two main strategies for dealing with high dimensional data (Andrews and Hemberg, 2018a). Dimensionality reduction methods generally project the data into a lower dimensional space by optimally preserving some key properties of the original data. PCA is a linear dimensional reduction algorithm, which assumes that the data is approximately normally distributed. T-distributed stochastic neighbor embedding (t-SNE) is a non-linear approach mainly designed for visualizing high dimensional data (van der Maaten and Hinton, 2008). Both PCA and t-SNE have been broadly used in diverse scRNA-seq studies to reduce the data dimension and visualize the cells discriminated into distinct subpopulations (Chen et al., 2016a; Rosenberg et al., 2018). It is worth noting that PCA cannot effectively represent the complex structure of scRNA-seq data and t-SNE has limitations of slow computation time and different embeddings for processing the same dataset multiple times. Recently, UMAP (uniform manifold approximation and projection) (Becht et al., 2018), and scvis (Ding et al., 2018) were specially developed for reducing the dimensions of scRNA-seq data. Becht et al. showed that UMAP provides the fastest run times, the highest reproducibility and the most meaningful organization of cell clusters than other dimensionality reduction approaches (Becht et al., 2018).

Feature selection removes the uninformative genes and identifies the most relevant features to reduce the number of dimensions used in downstream analysis. Reducing the number of genes by performing feature selection can largely speed up the calculations of large-scale scRNA-seq data (Andrews and Hemberg, 2018b). Differential expression is a widely used method for feature selection in bulk RNA-seq experiments, but it is hard to apply to scRNA-seq data since the information of predetermined and/or homogeneous subpopulations needed for differential expression calling of scRNA-seq data [e.g., SCDE (Kharchenko et al., 2014)] is often unavailable. Unsupervised feature selection algorithms specially designed for scRNA-seq data can be divided into the following groups: (i) highly variable genes (HVG) based; (ii) spike-in based; and (iii) dropout-based (Andrews and Hemberg, 2018a). HVG methods rely on the assumption that the genes with highly variable expression across cells are resulted from biological effects rather than technical noise. The HVG approaches include algorithms proposed by Brennecke et al. (2013), and FindVariableGenes (FVG) implemented in Seurat (Satija et al., 2015). Spike-in based approaches identify the genes showing significant higher variance than those of spike-ins with similar expression levels [e.g., scLVM (Buettner et al., 2015) and BASiCS (Vallejos et al., 2015)], which shares similar idea of HVG. Dropout based methods take advantage of the dropout distribution of scRNA-seq data to perform feature selection, like M3Drop (Andrews and Hemberg, 2018b). Andrews and Hemberg showed that their M3Drop tool outperforms existing variance-based feature selection approaches.

Cell Subpopulation Identification

A key goal of scRNA-seq data analysis is to identify cell subpopulations (different populations are often distinct cell types) within a certain condition or tissue to unravel the heterogeneity of cells. Notably, cell subpopulation identification should be carried out after QC and normalization of scRNA-seq data, otherwise artifacts could be introduced. Approaches for clustering cells can be mainly grouped into two categories based on whether prior information is used. If a set of known markers was used in clustering, the methods are prior information based. Alternatively, unsupervised clustering methods can be used for de novo identification of cell populations with scRNA-seq data. The algorithms for unsupervised clustering can be primarily divided into the following types: (i) k-means; (ii) hierarchical clustering; (iii) density-based clustering; and (iv) graph-based clustering (Andrews and Hemberg, 2018a). K-means is a fast approach that assigns cells to the nearest cluster center, and it requires the predetermined number of clusters. Hierarchical clustering can determine the relationships between clusters, but it generally works slower than k-means. Density-based clustering methods need a large number of samples to accurately calculate densities and usually assume that all clusters have equal density. Graph-based clustering can be considered as an extension of density-based clustering, and it can be applied to millions of cells. Some clustering methods have been specially designed for scRNA-seq data, such as single-cell consensus clustering (SC3) (Kiselev et al., 2017) and the clustering approach implemented in Seurat (Satija et al., 2015), which can facilitate the identification of cell subpopulations (Table 3). SC3 is an unsupervised approach that combines multiple clustering approaches, which has a high accuracy and robustness in single-cell clustering. Seurat identifies the cell clusters mainly based on a shared nearest neighbor (SNN) clustering algorithm. Once the subpopulations are determined, the markers that can best discriminate distinct subpopulations are usually identified through differential expression calling or analysis of variance (ANOVA).

Table 3

Methods	URL	References
SC3	http://bioconductor.org/packages/SC3	Kiselev et al., 2017
ZIFA	https://github.com/epierson9/ZIFA	Pierson and Yau, 2015
Destiny	https://github.com/theislab/destiny	Angerer et al., 2016
SNN-Cliq	http://bioinfo.uncc.edu/SNNCliq/	Xu and Su, 2015
RaceID	https://github.com/dgrun/RaceID	Grun et al., 2015
SCUBA	https://github.com/gcyuan/SCUBA	Marco et al., 2014
BackSPIN	https://github.com/linnarsson-lab/BackSPIN	Zeisel et al., 2015
PAGODA	http://hms-dbmi.github.io/scde/	Fan et al., 2016
CIDR	https://github.com/VCCRI/CIDR	Lin et al., 2017
pcaReduce	https://github.com/JustinaZ/pcaReduce	Zurauskiene and Yau, 2016
Seurat	https://github.com/satijalab/seurat	Satija et al., 2015
TSCAN	https://github.com/zji90/TSCAN	Ji and Ji, 2016

Subpopulation identification methods for scRNA-seq data.

Differential Expression Analysis of ScRNA-Seq Data

Differential expression analysis is very useful to find the significantly differentially expressed genes (DEGs) between distinct subpopulations or groups of cells. The DEGs are crucial for interpreting the biological difference between two compared conditions. The technical variability, high noise (e.g., dropouts) and massive sample size of scRNA-seq data raise challenges in differential expression calling (McDavid et al., 2013). Moreover, multiple possible cell states can exist within a population of cells, leading to the multimodality of gene expression in cells (Vallejos et al., 2016). The tools originally developed for bulk RNA-seq data have been used in many single-cell studies to identify the DEGs, but the applicability of these methods for scRNA-seq data is still unclear. In recent years, some specific methods have been proposed for conducting differential expression calling based on scRNA-seq data, such as MAST (Finak et al., 2015), SCDE (Kharchenko et al., 2014), DEsingle (Miao et al., 2018), Census (Qiu et al., 2017), and BCseq (Chen and Zheng, 2018) (Table 4). MAST is based on linear model fitting and likelihood ratio testing. SCDE is a Bayesian approach using a low-magnitude Poisson process to account for dropouts. DEsingle employs Zero-Inflated Negative Binomial model to estimate the dropouts and real zeros. BCseq mitigates the technical noise in a data-adaptive manner. Soneson and Robinson recently assessed 36 differential expression methods (including the tools designed for scRNA-seq and bulk RNA-seq data) and revealed significant differences among these approaches in the characteristics and number of DEGs (Soneson and Robinson, 2018). An increasing number of tools for differential expression analysis of scRNA-seq data will be developed, and users are encouraged to choose the tools specially designed for scRNA-seq to identify DEGs in consideration of the complex features of scRNA-seq data.

Table 4

Methods	Category	URL	Referenes
ROTS	Single cell	https://bioconductor.org/packages/release/bioc/html/ROTS.html	Seyednasrollah et al., 2016
MAST	Single cell	https://github.com/RGLab/MAST	Finak et al., 2015
BCseq	Single cell	https://bioconductor.org/packages/devel/bioc/html/bcSeq.html	Chen and Zheng, 2018
SCDE	Single cell	http://hms-dbmi.github.io/scde/	Kharchenko et al., 2014
DEsingle	Single cell	https://bioconductor.org/packages/DEsingle	Miao et al., 2018
Cencus	Single cell	http://cole-trapnell-lab.github.io/monocle-release/	Qiu et al., 2017
D3E	Single cell	https://github.com/hemberg-lab/D3E	Delmans and Hemberg, 2016
BPSC	Single cell	https://github.com/nghiavtr/BPSC	Vu et al., 2016
DESeq2	Bulk	https://bioconductor.org/packages/release/bioc/html/DESeq2.html	Love et al., 2014
edgeR	Bulk	https://bioconductor.org/packages/release/bioc/html/edgeR.html	Robinson et al., 2010
Limma	Bulk	http://bioconductor.org/packages/release/bioc/html/limma.html	Ritchie et al., 2015
Ballgown	Bulk	http://www.bioconductor.org/packages/release/bioc/html/ballgown.html	Frazee et al., 2015

Differential expression analysis tools for RNA-seq data.

Cell Lineage and Pseudotime Reconstruction

The cells in many biological systems exhibit a continuous spectrum of states and involve transitions between different cellular states. Such dynamic processes within a portion of cells can be computationally modeled by reconstructing the cell trajectory and pseudotime based on scRNA-seq data. Pseudotime is an ordering of cells along the trajectory of a continuously developmental process in a system, which allows the identification of the cell types at the beginning, intermediate, and end states of the trajectory (Griffiths et al., 2018). Besides revealing the gene expression dynamics across cells, single-cell trajectory inference can also benefit the identification of the factors triggering state transitions. A number of tools have been proposed for trajectory inference, e.g., Monocle (Trapnell et al., 2014), Waterfall (Shin et al., 2015), Wishbone (Setty et al., 2016), TSCAN (Ji and Ji, 2016), Monocle2 (Qiu et al., 2017), Slingshot (Street et al., 2018), and CellRouter (Lummertz da Rocha et al., 2018) (Table 5). The resulting trajectory topology can be linear, bifurcating, or a tree/graph structure. Monocle builds a minimum spanning tree (MST) for cells to search for the longest backbone based on independent component analysis (ICA). Monocle2 uses a distinct approach that incorporates unsupervised data-driven methods with reversed graph embedding (RGE), which is more robust and much faster than Monocle. Slingshot is a cluster-based approach for identifying multiple trajectories with varying levels of supervision. CellRouter utilizes flow networks to identify cell-state transition trajectories. Recently, Saelens et al. (2018) evaluated a number of single-cell trajectory inference approaches (did not include CellRouter), and found that Slingshot, TSCAN and Monocle2 outperform other methods.

Table 5

Tools	Dimensionality reduction	URL	References
Monocle	ICA	http://cole-trapnell-lab.github.io/monocle-release/	Trapnell et al., 2014
Waterfall	PCA	https://www.cell.com/cms/10.1016/j.stem.2015.07.013/attachment/3e966901-034f-418a-a439-996c50292a11/mmc9.zip	Shin et al., 2015
Wishbone	Diffusion maps	https://github.com/ManuSetty/wishbone	Setty et al., 2016
GrandPrix	Gaussian Process Latent Variable Model	https://github.com/ManchesterBioinference/GrandPrix	Ahmed et al., 2019
SCUBA	t-SNE	https://github.com/gcyuan/SCUBA	Marco et al., 2014
DPT	Diffusion maps	https://media.nature.com/original/nature-assets/nmeth/journal/v13/n10/extref/nmeth.3971-S3.zip	Haghverdi et al., 2016
TSCAN	PCA	https://github.com/zji90/TSCAN	Ji and Ji, 2016
Monocle2	RGE	http://cole-trapnell-lab.github.io/monocle-release/	Qiu et al., 2017
Slingshot	Any	https://github.com/kstreet13/slingshot	Street et al., 2018
CellRouter	Any	https://github.com/edroaldo/cellrouter	Lummertz da Rocha et al., 2018

Methods for single-cell trajectory inference.

Alternative Splicing and RNA Editing Analysis of ScRNA-Seq Data

Most of published single-cell studies mainly explored the transcriptome variation between individual cells at gene level. In eukaryotic genome, AS allows multi-exon genes to generate different isoforms, which can largely increase the diversity of both protein-coding and noncoding RNAs. Five basic modes are generally recognized for AS, including exon-skipping (cassette exon), mutually exclusive exons, alternative donor site, alternative acceptor site, and intron retention. Lots of studies have shown that AS is very common in mammalians and over 90% of human genes could undergo AS based on bulk RNA-seq data (Wang et al., 2008; Chen et al., 2017a). Moreover, AS play crucial roles in a variety of biological processes and abnormal AS may be correlated with cancers (Sveen et al., 2016). The findings revealed by bulk RNA-seq data can only reflect the averaged AS patterns of numerous cells at population level. Due to the high noise (e.g., dropouts and uneven transcript coverage) and low sequencing coverage of scRNA-seq data, the splicing quantification methods initially developed for bulk RNA-seq data are not suitable for scRNA-seq data. Since expression dynamics is a key aspect of cell populations, it is promising to study AS at single-cell resolution to gain insights into cell-level isoform usage. To date, only a few number of AS detection approaches were devised for scRNA-seq data, such as SingleSplice (Welch et al., 2016), Census (Qiu et al., 2017), BRIE (Huang and Sanguinetti, 2017), and Expedition (Song et al., 2017) (Table 6). SingleSplice uses a statistical model to detect the genes with a significant isoform usage without estimating the expression levels of full-length transcripts. Census models the isoform counts of each gene with a linear model as a Dirichlet-multinomial distribution. BRIE is a Bayesian hierarchical model for differential isoform quantification. Expedition contains a suite of algorithms for identifying AS, assigning splicing modalities and visualize modality changes. The AS detection approaches specially designed for scRNA-seq data are just emerging, thus the innovation and improvement of such methods will largely facilitate AS exploration at the single-cell level.

Table 6

Tools	URL	References
SingleSplice	https://github.com/jw156605/SingleSplice	Welch et al., 2016
Expedition	https://github.com/YeoLab/Expedition	Song et al., 2017
BRIE	https://github.com/huangyh09/brie	Huang and Sanguinetti, 2017
Census	http://cole-trapnell-lab.github.io/monocle-release/	Qiu et al., 2017

Alternative splicing detection tools for scRNA-seq data.

On the other hand, RNA-editing is an important post-transcriptional processing event that leads to sequence changes on RNA molecules (Gott and Emeson, 2000). Similarly, RNA-editing is mainly studied using bulk RNA-seq technologies but rarely explored at the single-cell level. Currently, the limitations of scRNA-seq largely prevented the application of RNA-editing detection to individual cells. Accordingly, with the development of both scRNA-seq technologies and single-cell editing detection algorithms, exploration of RNA-editing dynamics among single cells will be feasible. Notably, both AS and RNA-editing are mainly suitable for the data generated by scRNA-seq protocols that can sequence full-length transcripts such as Smart-seq2 and MATQ-seq rather than 3′-end scRNA-seq approaches.

Allelic Expression Exploration with ScRNA-Seq Data

Diploid species contain two sets of chromosomes that are separately obtained from their parents. Allelic expression analysis can reveal whether genes are equally expressed between parental and maternal genomes. For autosomes, the parental and maternal expression are generally expressed equally, and aberrant expression of parental or maternal genome may cause certain diseases (McKean et al., 2016). Up to now, few methods were developed to detect the genome-wide allelic expression profile of genes based on scRNA-seq data. One main caution of allelic expression calling is that the high dropouts of scRNA-seq data may introduce many false positives. Deng et al. (2014) used a series of stringent criteria to filter the potentially false allelic calls resulted from the technical variability of scRNA-seq in studying allelic expression profile of mouse preimplantation embryos. The robustness of this strategy was further demonstrated in analyzing the dynamics of X chromosome inactivation along developmental progression using mouse embryonic stem cells (Chen et al., 2016a). SCALE was recently proposed to classify the gene expression into silent, monoallelic and biallelic, states by adopting an empirical Bayes approach (Jiang et al., 2017). We believe that allelic expression analysis at single-cell level can largely facilitate the understanding of the underlying mechanisms of dosage compensation and related diseases. It is worth noting that allelic expression investigation at single-cell level also needs the whole-transcript scRNA-seq and is mainly applicable to the organism that has available paternal and maternal single nucleotide polymorphism (SNP) information.

Gene Regulatory Network Reconstruction

Gene regulatory network inference has been widely conducted in numerous bulk RNA-seq studies, while scRNA-seq also provides great potential for such analysis. For bulk RNA-seq data, networks are usually constructed from a number of samples using the tools like weighted gene co-expression network analysis (WGCNA) (Langfelder and Horvath, 2008; Chen et al., 2017a). A basic assumption is that the genes highly correlated in expression could be co-regulated. Because such an analysis is unable to determine the regulatory relationship, the resulting networks are typically undirected. Theoretically, the cells of scRNA-seq can be treated as the samples of bulk RNA-seq, then similar approaches are applicable to scRNA-seq data for constructing gene regulatory network.

Network inference of scRNA-seq data may reveal meaningful gene correlations and provide biologically important insights that could not be uncovered by population-level data of bulk RNA-seq. However, due to the technical noise of scRNA-seq and different subpopulations or sates of cells, attention should be paid to network reconstruction. To reduce spurious results, network inference should be carried out on each subpopulation or the cells with the same stage. Recently, Aibar et al. (2017) developed SCENIC method to reconstruct the gene regulatory network from scRNA-seq data and they showed that SCENIC can robustly predict the interactions between transcription factors and target genes. PIDC is another software designed to infer gene regulatory network from single-cell data using multivariate information theory (Chan et al., 2017). Such network inference tools facilitate the identification of expression regulatory network from single-cell transcriptomic data and provide critically biological insights into the regulatory relationships between genes.

Conclusion

In the past 10 years, a great advancement has been achieved in scRNA-seq and a variety of scRNA-seq protocols have been developed. The development and innovation of scRNA-seq largely facilitated single-cell transcriptomic studies, leading to insightful findings in cell expression variability and dynamics. Moreover, the throughput of scRNA-seq has significantly increased with the exciting progress in cellular barcoding and microfluidics. Meanwhile, scRNA-seq methods that can be used for fixation and frozen samples have also been proposed recently, which will greatly benefit the study of highly heterogeneous clinical samples. However, currently available scRNA-seq approaches still have a high dropout problem, in which weakly expressed genes would be missed. The improvement of RNA capture efficiency and transcript coverage will definitely reduce the technical noise of scRNA-seq. Moreover, since most of current scRNA-seq methods mainly capture polyA+ RNAs, the development of protocols that can capture both polyA+ and polyA- RNAs (such as MATQ-seq) will enable comprehensive investigation of both protein-coding and non-coding gene expression dynamics at single-cell resolution.

Since the noise of scRNA-seq data is high, it is crucial to use appropriate methods to overcome the problem in analyzing scRNA-seq data. QC is necessary to exclude those low-quality cells to avoid involving artifacts in data interpretation. Furthermore, batch effect correction (if need), between sample normalization and imputation are also important and should be conducted before cell subpopulation identification, differential expression calling, and other downstream analyses. Additionally, factors such as cell size and cell cycle state could play important roles in cell variability for certain types of cells, such biases are also need to be considered. Although an increasing number of methods have been specially designed to interpret scRNA-seq data, advances of novel methods that can effectively handle the technical noise and expression variability of cells are still required. Specifically, the approaches that can accurately analyze AS and RNA-editing with scRNA-seq data are highly useful to unravel post-transcriptional mechanisms in individual cells. Overall, bioinformatics analysis of scRNA-seq data is still challenging, special attention should be paid in data interpretation, and more efficient tools are in urgent need.

Collectively, scRNA-seq and its related computational methods largely promote the development of single-cell transcriptomics. The continuous innovation of scRNA-seq technologies and concomitant advances in bioinformatics approaches will greatly facilitate biological and clinical researches, and provide deep insights into the gene expression heterogeneity and dynamics of cells.

Disclaimer

The information in these materials is not a formal dissemination of the United States Food and Drug Administration.

Statements

Author contributions

GC and TS designed the study and wrote the manuscript. BN edited the manuscript and provided constructive comments.

Funding

This work was supported by the National Science Foundation of China (31771460, 91629103 and 31671377), National Key Research and Development Program of China (2016YFC0902100).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

1.^https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

References

1
AhmedS.RattrayM.BoukouvalasA. (2019). GrandPrix: scaling up the Bayesian GPLVM for single-cell data.Bioinformatics3547–54. 10.1093/bioinformatics/bty533
2
AibarS.Gonzalez-BlasC. B.MoermanT.Huynh-ThuV. A.ImrichovaH.HulselmansG.et al (2017). SCENIC: single-cell regulatory network inference and clustering.Nat. Methods141083–1086. 10.1038/nmeth.4463
3
AndrewsT. S.HembergM. (2018a). Identifying cell populations with scRNASeq.Mol. Aspects Med.59114–122. 10.1016/j.mam.2017.07.002
4
AndrewsT. S.HembergM. (2018b). M3Drop: dropout-based feature selection for scRNASeq.Bioinformatics10.1093/bioinformatics/bty1044 [Epub ahead of print].
5
AngererP.HaghverdiL.ButtnerM.TheisF. J.MarrC.BuettnerF. (2016). destiny: diffusion maps for large-scale single-cell data in R.Bioinformatics321241–1243. 10.1093/bioinformatics/btv715
6
BacherR.ChuL. F.LengN.GaschA. P.ThomsonJ. A.StewartR. M.et al (2017). SCnorm: robust normalization of single-cell RNA-seq data.Nat. Methods14584–586. 10.1038/nmeth.4263
7
BacherR.KendziorskiC. (2016). Design and computational analysis of single-cell RNA-sequencing experiments.Genome Biol.17:63. 10.1186/s13059-016-0927-y
8
BarrettS. P.SalzmanJ. (2016). Circular RNAs: analysis, expression and potential functions.Development1431838–1847. 10.1242/dev.128074
9
BechtE.McInnesL.HealyJ.DutertreC. A.KwokI. W. H.NgL. G.et al (2018). Dimensionality reduction for visualizing single-cell data using UMAP.Nat. Biotechnol.3738–44. 10.1038/nbt.4314
10
BrenneckeP.AndersS.KimJ. K.KolodziejczykA. A.ZhangX.ProserpioV.et al (2013). Accounting for technical noise in single-cell RNA-seq experiments.Nat. Methods101093–1095. 10.1038/nmeth.2645
11
BuettnerF.NatarajanK. N.CasaleF. P.ProserpioV.ScialdoneA.TheisF. J.et al (2015). Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.Nat. Biotechnol.33155–160. 10.1038/nbt.3102
12
ButtnerM.MiaoZ.WolfF. A.TeichmannS. A.TheisF. J. (2019). A test metric for assessing single-cell RNA-seq batch correction.Nat. Methods1643–49. 10.1038/s41592-018-0254-1
13
CaoJ.PackerJ. S.RamaniV.CusanovichD. A.HuynhC.DazaR.et al (2017). Comprehensive single-cell transcriptional profiling of a multicellular organism.Science357661–667. 10.1126/science.aam8940
14
ChanT. E.StumpfM. P. H.BabtieA. C. (2017). Gene regulatory network inference from single-cell data using multivariate information measures.Cell Syst.5251–267.e3. 10.1016/j.cels.2017.08.014
15
ChenG.ChenJ.YangJ.ChenL.QuX.ShiC.et al (2017a). Significant variations in alternative splicing patterns and expression profiles between human-mouse orthologs in early embryos.Sci. China Life Sci.60178–188. 10.1007/s11427-015-0348-5
16
ChenG.ShiT. L.ShiL. M. (2017b). Characterizing and annotating the genome using RNA-seq data.Sci. China Life Sci.60116–125. 10.1007/s11427-015-0349-4
17
ChenG.SchellJ. P.BenitezJ. A.PetropoulosS.YilmazM.ReiniusB.et al (2016a). Single-cell analyses of X Chromosome inactivation dynamics and pluripotency during differentiation.Genome Res.261342–1354. 10.1101/gr.201954.115
18
ChenG.YangJ.ChenJ.SongY.CaoR.ShiT.et al (2016b). Identifying and annotating human bifunctional RNAs reveals their versatile functions.Sci. China Life Sci.59981–992. 10.1007/s11427-016-0054-1
19
ChenG.WangC.ShiT. (2011). Overview of available methods for diverse RNA-Seq data analyses.Sci. China Life Sci.541121–1128. 10.1007/s11427-011-4255-x
20
ChenL.ZhengS. (2018). BCseq: accurate single cell RNA-seq quantification with bias correction.Nucleic Acids Res.46:e82. 10.1093/nar/gky308
21
ChenX.TeichmannS. A.MeyerK. B. (2018). From tissues to cell types and back: single-cell gene expression analysis of tissue architecture.Annu. Rev. Biomed. Data Sci.129–51. 10.1146/annurev-biodatasci-080917-013452
- CrossRef
- Google Scholar
22
DelmansM.HembergM. (2016). Discrete distributional differential expression (D3E)–a tool for gene expression analysis of single-cell RNA-seq data.BMC Bioinformatics17:110. 10.1186/s12859-016-0944-6
23
DengQ.RamskoldD.ReiniusB.SandbergR. (2014). Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells.Science343193–196. 10.1126/science.1245316
24
DingJ.CondonA.ShahS. P. (2018). Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.Nat. Commun.9:2002. 10.1038/s41467-018-04368-5
25
DobinA.GingerasT. R. (2015). Mapping RNA-seq reads with STAR.Curr. Protoc. Bioinformatics5111.14.1–11.14.19. 10.1002/0471250953.bi1114s51
26
EngstromP. G.SteijgerT.SiposB.GrantG. R.KahlesA.RatschG.et al (2013). Systematic evaluation of spliced alignment programs for RNA-seq data.Nat. Methods101185–1191. 10.1038/nmeth.2722
27
ExternalR. N. A. C. C. (2005). Proposed methods for testing and selecting the ERCC external RNA controls.BMC Genomics6:150. 10.1186/1471-2164-6-150
28
FanH. C.FuG. K.FodorS. P. (2015). Expression profiling. Combinatorial labeling of single cells for gene expression cytometry.Science347:1258367. 10.1126/science.1258367
29
FanX.ZhangX.WuX.GuoH.HuY.TangF.et al (2015). Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos.Genome Biol.16:148. 10.1186/s13059-015-0706-1
30
FanJ.SalathiaN.LiuR.KaeserG. E.YungY. C.HermanJ. L.et al (2016). Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis.Nat. Methods13241–244. 10.1038/nmeth.3734
31
FinakG.McDavidA.YajimaM.DengJ.GersukV.ShalekA. K.et al (2015). MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data.Genome Biol.16:278. 10.1186/s13059-015-0844-5
32
FrazeeA. C.PerteaG.JaffeA. E.LangmeadB.SalzbergS. L.LeekJ. T. (2015). Ballgown bridges the gap between transcriptome assembly and expression analysis.Nat. Biotechnol.33243–246. 10.1038/nbt.3172
33
GarberM.GrabherrM. G.GuttmanM.TrapnellC. (2011). Computational methods for transcriptome annotation and quantification using RNA-seq.Nat. Methods8469–477. 10.1038/nmeth.1613
34
GierahnT. M.WadsworthM. H.IIHughesT. K.BrysonB. D.ButlerA.SatijaR.et al (2017). Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.Nat. Methods14395–398. 10.1038/nmeth.4179
35
GongW.KwakI. Y.PotaP.Koyano-NakagawaN.GarryD. J. (2018). DrImpute: imputing dropout events in single cell RNA sequencing data.BMC Bioinformatics19:220. 10.1186/s12859-018-2226-y
36
GottJ. M.EmesonR. B. (2000). Functions and mechanisms of RNA editing.Annu. Rev. Genet.34499–531. 10.1146/annurev.genet.34.1.499
- CrossRef
- Google Scholar
37
GriffithsJ. A.ScialdoneA.MarioniJ. C. (2018). Using single-cell genomics to understand developmental processes and cell fate decisions.Mol. Syst. Biol.14:e8046. 10.15252/msb.20178046
38
GrossA.SchoendubeJ.ZimmermannS.SteebM.ZengerleR.KoltayP. (2015). Technologies for single-cell isolation.Int. J. Mol. Sci.1616897–16919. 10.3390/ijms160816897
39
GrunD.LyubimovaA.KesterL.WiebrandsK.BasakO.SasakiN.et al (2015). Single-cell messenger RNA sequencing reveals rare intestinal cell types.Nature525251–255. 10.1038/nature14966
40
HabibN.Avraham-DavidiI.BasuA.BurksT.ShekharK.HofreeM.et al (2017). Massively parallel single-nucleus RNA-seq with DroNc-seq.Nat. Methods14955–958. 10.1038/nmeth.4407
41
HaghverdiL.ButtnerM.WolfF. A.BuettnerF.TheisF. J. (2016). Diffusion pseudotime robustly reconstructs lineage branching.Nat. Methods13845–848. 10.1038/nmeth.3971
42
HaghverdiL.LunA. T. L.MorganM. D.MarioniJ. C. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.Nat. Biotechnol.36421–427. 10.1038/nbt.4091
43
HaqueA.EngelJ.TeichmannS. A.LonnbergT. (2017). A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications.Genome Med.9:75. 10.1186/s13073-017-0467-4
44
HashimshonyT.SenderovichN.AvitalG.KlochendlerA.de LeeuwY.AnavyL.et al (2016). CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq.Genome Biol.17:77. 10.1186/s13059-016-0938-8
45
HashimshonyT.WagnerF.SherN.YanaiI. (2012). CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification.Cell Rep.2666–673. 10.1016/j.celrep.2012.08.003
46
HicksS. C.TownesF. W.TengM.IrizarryR. A. (2018). Missing data and technical variability in single-cell RNA-sequencing experiments.Biostatistics19562–578. 10.1093/biostatistics/kxx053
47
HuP.ZhangW.XinH.DengG. (2016). Single cell isolation and analysis.Front. Cell. Dev. Biol.4:116. 10.3389/fcell.2016.00116
- CrossRef
- Google Scholar
48
HuangM.WangJ.TorreE.DueckH.ShafferS.BonasioR.et al (2018). SAVER: gene expression recovery for single-cell RNA sequencing.Nat. Methods15539–542. 10.1038/s41592-018-0033-z
49
HuangY.SanguinettiG. (2017). BRIE: transcriptome-wide splicing quantification in single cells.Genome Biol.18:123. 10.1186/s13059-017-1248-5
50
HwangB.LeeJ. H.BangD. (2018). Single-cell RNA sequencing technologies and bioinformatics pipelines.Exp. Mol. Med.50:96. 10.1038/s12276-018-0071-8
51
IlicicT.KimJ. K.KolodziejczykA. A.BaggerF. O.McCarthyD. J.MarioniJ. C.et al (2016). Classification of low quality cells from single-cell RNA-seq data.Genome Biol.17:29. 10.1186/s13059-016-0888-1
52
IslamS.KjallquistU.MolinerA.ZajacP.FanJ. B.LonnerbergP.et al (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq.Genome Res.211160–1167. 10.1101/gr.110882.110
53
IslamS.KjallquistU.MolinerA.ZajacP.FanJ. B.LonnerbergP.et al (2012). Highly multiplexed and strand-specific single-cell RNA 5’ end sequencing.Nat. Protoc.7813–828. 10.1038/nprot.2012.022
54
IslamS.ZeiselA.JoostS.La MannoG.ZajacP.KasperM.et al (2014). Quantitative single-cell RNA-seq with unique molecular identifiers.Nat. Methods11163–166. 10.1038/nmeth.2772
55
JaitinD. A.KenigsbergE.Keren-ShaulH.ElefantN.PaulF.ZaretskyI.et al (2014). Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types.Science343776–779. 10.1126/science.1247651
56
JiZ.JiH. (2016). TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.Nucleic Acids Res.44:e117. 10.1093/nar/gkw430
57
JiangP.ThomsonJ. A.StewartR. (2016). Quality control of single-cell RNA-seq by SinQC.Bioinformatics322514–2516. 10.1093/bioinformatics/btw176
58
JiangY.ZhangN. R.LiM. (2017). SCALE: modeling allele-specific gene expression by single-cell RNA sequencing.Genome Biol.18:74. 10.1186/s13059-017-1200-8
59
KatayamaS.TohonenV.LinnarssonS.KereJ. (2013). SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization.Bioinformatics292943–2945. 10.1093/bioinformatics/btt511
60
KharchenkoP. V.SilbersteinL.ScaddenD. T. (2014). Bayesian approach to single-cell differential expression analysis.Nat. Methods11740–742. 10.1038/nmeth.2967
61
KimD.LangmeadB.SalzbergS. L. (2015). HISAT: a fast spliced aligner with low memory requirements.Nat. Methods12357–360. 10.1038/nmeth.3317
62
KimD.PerteaG.TrapnellC.PimentelH.KelleyR.SalzbergS. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.Genome Biol.14:R36. 10.1186/gb-2013-14-4-r36
63
KiselevV. Y.KirschnerK.SchaubM. T.AndrewsT.YiuA.ChandraT.et al (2017). SC3: consensus clustering of single-cell RNA-seq data.Nat. Methods14483–486. 10.1038/nmeth.4236
64
KleinA. M.MazutisL.AkartunaI.TallapragadaN.VeresA.LiV.et al (2015). Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.Cell1611187–1201. 10.1016/j.cell.2015.04.044
65
KolodziejczykA. A.KimJ. K.SvenssonV.MarioniJ. C.TeichmannS. A. (2015). The technology and biology of single-cell RNA sequencing.Mol. Cell58610–620. 10.1016/j.molcel.2015.04.005
66
KristensenL. S.HansenT. B.VenoM. T.KjemsJ. (2018). Circular RNAs in cancer: opportunities and challenges in the field.Oncogene37555–565. 10.1038/onc.2017.361
67
LangfelderP.HorvathS. (2008). WGCNA: an R package for weighted correlation network analysis.BMC Bioinformatics9:559. 10.1186/1471-2105-9-559
68
LeekJ. T. (2014). svaseq: removing batch effects and other unwanted noise from sequencing data.Nucleic Acids Res.42:e161. 10.1093/nar/gku864
69
LeekJ. T.ScharpfR. B.BravoH. C.SimchaD.LangmeadB.JohnsonW. E.et al (2010). Tackling the widespread and critical impact of batch effects in high-throughput data.Nat. Rev. Genet.11733–739. 10.1038/nrg2825
70
LiB.DeweyC. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.BMC Bioinformatics12:323. 10.1186/1471-2105-12-323
71
LiH.HomerN. (2010). A survey of sequence alignment algorithms for next-generation sequencing.Brief. Bioinform.11473–483. 10.1093/bib/bbq015
72
LiW. V.LiJ. J. (2018). An accurate and robust imputation method scImpute for single-cell RNA-seq data.Nat. Commun.9:997. 10.1038/s41467-018-03405-7
73
LinP.TroupM.HoJ. W. (2017). CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data.Genome Biol.18:59. 10.1186/s13059-017-1188-0
74
LoveM. I.HuberW.AndersS. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.Genome Biol.15:550. 10.1186/s13059-014-0550-8
75
Lummertz da RochaE.RoweR. G.LundinV.MalleshaiahM.JhaD. K.RamboC. R.et al (2018). Reconstruction of complex single-cell trajectories using CellRouter.Nat. Commun.9:892. 10.1038/s41467-018-03214-y
76
LunA. T.BachK.MarioniJ. C. (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.Genome Biol.17:75. 10.1186/s13059-016-0947-7
77
MacoskoE. Z.BasuA.SatijaR.NemeshJ.ShekharK.GoldmanM.et al (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.Cell1611202–1214. 10.1016/j.cell.2015.05.002
78
MarcoE.KarpR. L.GuoG.RobsonP.HartA. H.TrippaL.et al (2014). Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape.Proc. Natl. Acad. Sci. U.S.A.111E5643–E5650. 10.1073/pnas.1408993111
79
McCarthyD. J.CampbellK. R.LunA. T.WillsQ. F. (2017). Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.Bioinformatics331179–1186. 10.1093/bioinformatics/btw777
80
McDavidA.FinakG.ChattopadyayP. K.DominguezM.LamoreauxL.MaS. S.et al (2013). Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments.Bioinformatics29461–467. 10.1093/bioinformatics/bts714
81
McKeanD. M.HomsyJ.WakimotoH.PatelN.GorhamJ.DePalmaS. R.et al (2016). Loss of RNA expression and allele-specific expression associated with congenital heart disease.Nat. Commun.7:12824. 10.1038/ncomms12824
82
MiaoZ.DengK.WangX.ZhangX. (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data.Bioinformatics343223–3224. 10.1093/bioinformatics/bty332
83
NichterwitzS.ChenG.Aguila BenitezJ.YilmazM.StorvallH.CaoM.et al (2016). Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling.Nat. Commun.7:12139. 10.1038/ncomms12139
84
PerteaM.PerteaG. M.AntonescuC. M.ChangT. C.MendellJ. T.SalzbergS. L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.Nat. Biotechnol.33290–295. 10.1038/nbt.3122
85
PicelliS. (2017). Single-cell RNA-sequencing: the future of genome biology is now.RNA Biol.14637–650. 10.1080/15476286.2016.1201618
86
PicelliS.BjorklundA. K.FaridaniO. R.SagasserS.WinbergG.SandbergR. (2013). Smart-seq2 for sensitive full-length transcriptome profiling in single cells.Nat. Methods101096–1098. 10.1038/nmeth.2639
87
PiersonE.YauC. (2015). ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis.Genome Biol.16:241. 10.1186/s13059-015-0805-z
88
QiuX.HillA.PackerJ.LinD.MaY. A.TrapnellC. (2017). Single-cell mRNA quantification and differential analysis with Census.Nat. Methods14309–315. 10.1038/nmeth.4150
89
QuinnJ. J.ChangH. Y. (2016). Unique features of long non-coding RNA biogenesis and function.Nat. Rev. Genet.1747–62. 10.1038/nrg.2015.10
90
RamskoldD.LuoS.WangY. C.LiR.DengQ.FaridaniO. R.et al (2012). Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells.Nat. Biotechnol.30777–782. 10.1038/nbt.2282
91
RissoD.NgaiJ.SpeedT. P.DudoitS. (2014). Normalization of RNA-seq data using factor analysis of control genes or samples.Nat. Biotechnol.32896–902. 10.1038/nbt.2931
92
RitchieM. E.PhipsonB.WuD.HuY.LawC. W.ShiW.et al (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Res.43:e47. 10.1093/nar/gkv007
93
RobinsonM. D.McCarthyD. J.SmythG. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.Bioinformatics26139–140. 10.1093/bioinformatics/btp616
94
RobinsonM. D.OshlackA. (2010). A scaling normalization method for differential expression analysis of RNA-seq data.Genome Biol.11:R25. 10.1186/gb-2010-11-3-r25
95
RosenbergA. B.RocoC. M.MuscatR. A.KuchinaA.SampleP.YaoZ.et al (2018). Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding.Science360176–182. 10.1126/science.aam8999
96
SaelensW.CannoodtR.TodorovH.SaeysY. (2018). A comparison of single-cell trajectory inference methods: towards more accurate and robust tools.bioRxiv [Preprint]. 10.1101/276907
- CrossRef
- Google Scholar
97
SasagawaY.DannoH.TakadaH.EbisawaM.TanakaK.HayashiT.et al (2018). Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.Genome Biol.19:29. 10.1186/s13059-018-1407-3
98
SasagawaY.NikaidoI.HayashiT.DannoH.UnoK. D.ImaiT.et al (2013). Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity.Genome Biol.14:R31. 10.1186/gb-2013-14-4-r31
99
SatijaR.FarrellJ. A.GennertD.SchierA. F.RegevA. (2015). Spatial reconstruction of single-cell gene expression data.Nat. Biotechnol.33495–502. 10.1038/nbt.3192
100
SettyM.TadmorM. D.Reich-ZeligerS.AngelO.SalameT. M.KathailP.et al (2016). Wishbone identifies bifurcating developmental trajectories from single-cell data.Nat. Biotechnol.34637–645. 10.1038/nbt.3569
101
SeyednasrollahF.RantanenK.JaakkolaP.EloL. L. (2016). ROTS: reproducible RNA-seq biomarker detector-prognostic markers for clear cell renal cell cancer.Nucleic Acids Res.44:e1. 10.1093/nar/gkv806
102
ShengK.CaoW.NiuY.DengQ.ZongC. (2017). Effective detection of variation in single-cell transcriptomes using MATQ-seq.Nat. Methods14267–270. 10.1038/nmeth.4145
103
ShinJ.BergD. A.ZhuY.ShinJ. Y.SongJ.BonaguidiM. A.et al (2015). Single-cell RNA-Seq with waterfall reveals molecular cascades underlying adult neurogenesis.Cell Stem Cell17360–372. 10.1016/j.stem.2015.07.013
104
SonesonC.RobinsonM. D. (2018). Bias, robustness and scalability in single-cell differential expression analysis.Nat. Methods15255–261. 10.1038/nmeth.4612
105
SongY.BotvinnikO. B.LovciM. T.KakaradovB.LiuP.XuJ. L.et al (2017). Single-cell alternative splicing analysis with expedition reveals splicing dynamics during neuron differentiation.Mol. Cell67148–161.e5. 10.1016/j.molcel.2017.06.003
106
StegleO.TeichmannS. A.MarioniJ. C. (2015). Computational and analytical challenges in single-cell transcriptomics.Nat. Rev. Genet.16133–145. 10.1038/nrg3833
107
StreetK.RissoD.FletcherR. B.DasD.NgaiJ.YosefN.et al (2018). Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.BMC Genomics19:477. 10.1186/s12864-018-4772-0
108
SveenA.KilpinenS.RuusulehtoA.LotheR. A.SkotheimR. I. (2016). Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes.Oncogene352413–2427. 10.1038/onc.2015.318
109
SvenssonV.NatarajanK. N.LyL. H.MiragaiaR. J.LabaletteC.MacaulayI. C.et al (2017). Power analysis of single-cell RNA-sequencing experiments.Nat. Methods14381–387. 10.1038/nmeth.4220
110
TalwarD.MongiaA.SenguptaD.MajumdarA. (2018). AutoImpute: autoencoder based imputation of single-cell RNA-seq data.Sci. Rep.8:16329. 10.1038/s41598-018-34688-x
111
TangF.BarbacioruC.WangY.NordmanE.LeeC.XuN.et al (2009). mRNA-Seq whole-transcriptome analysis of a single cell.Nat. Methods6377–382. 10.1038/nmeth.1315
112
TrapnellC.CacchiarelliD.GrimsbyJ.PokharelP.LiS.MorseM.et al (2014). The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.Nat. Biotechnol.32381–386. 10.1038/nbt.2859
113
TrapnellC.WilliamsB. A.PerteaG.MortazaviA.KwanG.van BarenM. J.et al (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.Nat. Biotechnol.28511–515. 10.1038/nbt.1621
114
VallejosC. A.MarioniJ. C.RichardsonS. (2015). BASiCS: bayesian analysis of single-cell sequencing data.PLoS Comput. Biol.11:e1004333. 10.1371/journal.pcbi.1004333
115
VallejosC. A.RichardsonS.MarioniJ. C. (2016). Beyond comparisons of means: understanding changes in gene expression at the single-cell level.Genome Biol.17:70. 10.1186/s13059-016-0930-3
116
VallejosC. A.RissoD.ScialdoneA.DudoitS.MarioniJ. C. (2017). Normalizing single-cell RNA sequencing data: challenges and opportunities.Nat. Methods14565–571. 10.1038/nmeth.4292
117
van der MaatenL.HintonG. (2008). Visualizing data using t-SNE.J. Mach. Learn. Res.92579–2605.
- Google Scholar
118
van DijkD.SharmaR.NainysJ.YimK.KathailP.CarrA. J.et al (2018). Recovering gene interactions from single-cell data using data diffusion.Cell174716–729.e27. 10.1016/j.cell.2018.05.061
119
VuT. N.WillsQ. F.KalariK. R.NiuN.WangL.RantalainenM.et al (2016). Beta-Poisson model for single-cell RNA-seq data analyses.Bioinformatics322128–2135. 10.1093/bioinformatics/btw202
120
WangE. T.SandbergR.LuoS.KhrebtukovaI.ZhangL.MayrC.et al (2008). Alternative isoform regulation in human tissue transcriptomes.Nature456470–476. 10.1038/nature07509
121
WelchJ. D.HuY.PrinsJ. F. (2016). Robust detection of alternative splicing in a population of single cells.Nucleic Acids Res.44:e73. 10.1093/nar/gkv1525
122
XuC.SuZ. (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method.Bioinformatics311974–1980. 10.1093/bioinformatics/btv088
123
ZeiselA.Munoz-ManchadoA. B.CodeluppiS.LonnerbergP.La MannoG.JureusA.et al (2015). Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.Science3471138–1142. 10.1126/science.aaa1934
124
ZhangL.ZhangS. (2018). Comparison of computational methods for imputing single-cell RNA-sequencing data.IEEE/ACM Trans. Comput. Biol. Bioinform.10.1109/TCBB.2018.2848633 [Epub ahead of print].
125
ZhengG. X.TerryJ. M.BelgraderP.RyvkinP.BentZ. W.WilsonR.et al (2017). Massively parallel digital transcriptional profiling of single cells.Nat. Commun.8:14049. 10.1038/ncomms14049
126
ZiegenhainC.ViethB.ParekhS.ReiniusB.Guillaumet-AdkinsA.SmetsM.et al (2017). Comparative analysis of single-cell RNA sequencing methods.Mol. Cell65631–643.e4. 10.1016/j.molcel.2017.01.023
127
ZurauskieneJ.YauC. (2016). pcaReduce: hierarchical clustering of single cell transcriptional profiles.BMC Bioinformatics17:140. 10.1186/s12859-016-0984-y

Summary

Keywords

single-cell RNA-seq, cell clustering, cell trajectory, alternative splicing, allelic expression

Citation

Chen G, Ning B and Shi T (2019) Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Front. Genet. 10:317. doi: 10.3389/fgene.2019.00317

Received

05 December 2018

Accepted

21 March 2019

Published

05 April 2019

Volume

10 - 2019

Edited by

Filippo Geraci, Italian National Research Council (CNR), Italy

Reviewed by

Vsevolod Jurievich Makeev, Vavilov Institute of General Genetics (RAS), Russia; Iros Barozzi, Imperial College London, United Kingdom

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Geng Chen, gchen@bio.ecnu.edu.cn Tieliu Shi, tieliushi@yahoo.com

This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Computational Genomics

REVIEW article

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

Abstract

Introduction

Isolation of Single Cells

Currently Available ScRNA-Seq Technologies

Read Alignment and Expression Quantification of ScRNA-Seq Data

Quality Control of ScRNA-Seq Data

Batch Effect Correction

Normalization of ScRNA-Seq Data

Imputation of ScRNA-Seq Data

Dimensionality Reduction and Feature Selection

Cell Subpopulation Identification

Differential Expression Analysis of ScRNA-Seq Data

Cell Lineage and Pseudotime Reconstruction

Alternative Splicing and RNA Editing Analysis of ScRNA-Seq Data

Allelic Expression Exploration with ScRNA-Seq Data

Gene Regulatory Network Reconstruction

Conclusion

Disclaimer

Statements

Author contributions

Funding

Conflict of interest

Footnotes

References

Summary

Outline

Figures

Cite article

Article metrics

REVIEW article

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

Abstract

Introduction

Isolation of Single Cells

Currently Available ScRNA-Seq Technologies

Read Alignment and Expression Quantification of ScRNA-Seq Data

Quality Control of ScRNA-Seq Data

Batch Effect Correction

Normalization of ScRNA-Seq Data

Imputation of ScRNA-Seq Data

Dimensionality Reduction and Feature Selection

Cell Subpopulation Identification

Differential Expression Analysis of ScRNA-Seq Data

Cell Lineage and Pseudotime Reconstruction

Alternative Splicing and RNA Editing Analysis of ScRNA-Seq Data

Allelic Expression Exploration with ScRNA-Seq Data

Gene Regulatory Network Reconstruction

Conclusion

Disclaimer

Statements

Author contributions

Funding

Conflict of interest

Footnotes

References

Summary

Outline

Figures

Cite article

Share article

Article metrics