Identifying Differentially Expressed Genes of Zero Inflated Single Cell RNA Sequencing Data Using Mixed Model Score Tests

Single cell RNA sequencing (scRNA-seq) allows quantitative measurement and comparison of gene expression at the resolution of single cells. Ignoring the batch effects and zero inflation of scRNA-seq data, many proposed differentially expressed (DE) methods might generate bias. We propose a method, single cell mixed model score tests (scMMSTs), to efficiently identify DE genes of scRNA-seq data with batch effects using the generalized linear mixed model (GLMM). scMMSTs treat the batch effect as a random effect. For zero inflation, scMMSTs use a weighting strategy to calculate observational weights for counts independently under zero-inflated and zero-truncated distributions. Counts data with calculated weights were subsequently analyzed using weighted GLMMs. The theoretical null distributions of the score statistics were constructed by mixed Chi-square distributions. Intensive simulations and two real datasets were used to compare edgeR-zinbwave, DESeq2-zinbwave, and scMMSTs. Our study demonstrates that scMMSTs, as supplement to standard methods, are advantageous to define DE genes of zero-inflated scRNA-seq data with batch effects.


INTRODUCTION
In modern biology, transcriptomics has been widely used to elucidate the molecular basis of biological processes and diseases (Van den Berge et al., 2018). Previous transcriptome sequencing techniques (bulk RNA-seq) ) might obscure the cell type heterogeneity in different samples. Because of the resolution, bulk RNA-seq hardly defines the rare cells, such as stem cells and tumor cells. Single cell RNA sequencing (scRNA-seq) enables researchers to study characteristics of gene expression in the resolution of individual cells (Kolodziejczyk et al., 2015). scRNA-seq has been treated as an effective method to study cellular heterogeneity in complex biological systems, and is being applied by more researchers in various biological processes, such as stem cell development and differentiation, embryonic organ development, tumors, immunology, and neurology (Tang et al., 2009;McEvoy et al., 2011;Zeisel et al., 2015;Chu et al., 2016;Papalexi and Satija, 2018;Sun et al., 2019). Identifying differentially expressed (DE) genes is one of the most common analysis of both bulk RNA-seq and of scRNA-seq analysis (Robinson et al., 2010;Van den Berge et al., 2017, 2018Sun et al., 2018).
For bulk RNA-seq and scRNA-seq data, batch effects conventionally were treated as the non-biological differences that occurs when samples or cells are measured in distinct batches. The measure of transcriptome can be influenced by different environments for cells (Luecken and Theis, 2019). Various methods to correct batch effects and preserve biological variability have been presented. Some methods directly remove or correct batch effects using linear models (Johnson et al., 2007;Tung et al., 2017;Somekh et al., 2019). ComBat (Johnson et al., 2007) is an empirical Bayes method which takes batch effects into a linear regression model of gene expression. ComBat was recommended for batch correction when groups or cell types and state compositions between batches are consistent (Luecken and Theis, 2019). Mutual nearest neighbors (MNNs) (Haghverdi et al., 2018) and canonical correlation analysis (CCA) (Butler et al., 2018) remove batch effects using nonlinear models. A method comparison study showed ComBat was the best one for both bulk RNA-seq and scRNA-seq data (Büttner et al., 2019). For DE analysis, it was recommended that DE testing should be conducted on measure data with covariates including the batch information in the model design, not on batch corrected data (Luecken and Theis, 2019).
Some studies directly used traditional bulk RNA-seq DE methods (Krieg et al., 2018;Roerink et al., 2018;Li et al., 2019;Mehtonen et al., 2020). Limma-voom (Ritchie et al., 2015) applies weighted linear regression models for log-transformed count data. edgeR (Robinson et al., 2010;McCarthy et al., 2012) and DESeq2 (Love et al., 2014) model the gene expression count data based on generalized linear models (GLMs) under negative binomial (NB) distributions. It was demonstrated that NB models overestimated the dispersion parameter with excess zero counts, which influenced the power to DE analysis ( Van den Berge et al., 2018). Different to bulk RNA-seq data, dropout events cause excess zeros for scRNA-seq read count data (Finak et al., 2015;Hashimshony et al., 2016). Therefore, zero inflation or an excess of zeros is a particular feature of scRNA-seq data, and it is not considered for these methods. SCDE (Kharchenko and Fan, 2019) and MAST (Finak et al., 2015;McDavid et al., 2019) model the redundant zeros of scRNA-seq data by zero inflation and hurdle models, respectively. Both zinbwave Van den Berge et al., 2018) and zingeR (Van den Berge et al., 2017) estimates observational weights based on a zeroinflated negative binomial (ZiNB) model and downweight excess zeros followed by classical bulk RNA-seq DE tools (e.g., edgeR and DESeq2). The performance of two combinations, edgeRzinbwave and DESeq2-zinbwave, outperform other DE methods (Van den Berge et al., 2018).
Here, based on isoVCT (Yang et al., 2017) and SMMATs (Chen et al., 2019), we implement a series of efficient methods, the single cell mixed model score tests (scMMSTs), to identify DE genes for defined cell types in scRNA-seq data considering batch effects and zero inflation. isoVCT, a DE method for bulk RNA-seq, uses a random effect to consider the heterogeneous isoform effects. In large-scale whole-genome sequencing (WGS) studies, SMMATs are powerful and computationally efficient variant set tests for continuous and binary traits, which integrates the burden test and SKAT (Wu et al., 2011) under the framework of generalized linear mixed models (GLMMs).

Generalized Linear Mixed Models
For a single gene, we consider the following: where g (·) is a monotonic differentiable link function for GLMs, µ i = E y i |g i , B i , b denotes the mean of phenotype or count y i for subject or cell i for a given gene with sample size n to the intercept α, g i is the group, cluster or cell type covariate dummy variable binary value for subject i, B i is the row vector of dummy variables values of the batch or individual covariate for subject i, β is the group effects associated with bathes and b is the batch effects. In the above equation, the group effects β are assumed to follow the normal distribution N(β 0 1 p , σ 2 β I p ), where 1 p is the p × 1 dimensional vector whose elements are all 1, I p is the p × p dimensional identity matrix, β 0 and σ 2 β are mean and variance of the normal distribution and p is the number of batches. If σ 2 β > 0, group effects are associated with the batches. We assume the batch random effects b ∼ N(0 p , σ 2 b I p ), where 0 p is the p × 1 dimensional vector whose elements are all 0 and σ 2 b is the variance. We consider the binomial, quasi-binomial, Poisson, quasi-Poisson, and NB distributions to model y i . Binary phenotypes are commonly modeled by binomial and quasibinomial distributions and counts are commonly modeled by Poisson, quasi-Poisson, and NB distributions.
For single cell RNA-seq data of a given gene, y i is the count for cell i. We identify DE genes for each defined cell type in the form of one-against-others, so g i , the cell type covariate for cell i, is binary. GLMMs under Poisson, quasi-Poisson and NB distributions are appropriate in this scenario.

Single Cell Mixed Model Score Tests
Testing H 0 : β = 0 is equivalent to testing H 0 : β 0 = 0 and σ 2 β = 0. Under the null hypothesis, the reduced GLMM is as follows.
We construct a variance component score test statistic T derived by testing H 0 : σ 2 β = 0 under the assumption β 0 = 0. SMMAT-O was also derived in the same manner. Under H 0 with the assumption β 0 = 0, we have the same reduced null model as that under H 0 : β = 0. Therefore, our derived test statistic T is applicable for testing H 0 . The test statistic T is shown as follows.
where y = y 1 y 2 · · · y n T is an n × 1 vector of counts or phenotypes, µ 0 = g −1 α+B i b is the estimated mean vector of Frontiers in Genetics | www.frontiersin.org the reduced null model under H 0 , α and b are estimates of the α and b, = diag 1/ 1 + µ 0i / θ for the NB distribution with the estimated dispersion parameter θ and = I n for other distributions mentioned, B = B T 1 B T 2 · · · B T n T is an n × p design matrix of group covariate dummy variables values, G B = g 1 B T 1 g 2 B T 2 · · · g n B T n T is an n × p design matrix of interactions of group and batch covariates with the multiplication of corresponding dummy variables values and τ is the estimate of dispersion parameter τ for quasi distributions, which is 1 for the binomial, Poisson and NB distributions and is estimated by the residual deviance divided by the degree of freedom of the reduced null model for quasi-binomial and quasi-Poisson distributions. The asymptotic distribution of the statistic T under H 0 is derived as follows. Following the theoretical results of mixed models (Harville, 1977;Breslow and Clayton, 1993;Santos Nobre and da Motta Singer, 2007;Chen et al., 2016), we have e = (y − µ 0 )/ √ τ asymptotically following a n-dimensional multivariate normal distribution MVN n (0, D −1 V P P V D −1 ) under H 0 , where D = diag g ( µ 0i ) , whose diagonal elements are the first order derivative of the link function g (·) evaluated at µ 0i , P is the n × n projection matrix of the Var(y i ) , the first order derivative function of the link function g (·) and the estimated variance of y i , Var(y i ). For binomial and quasi-binomial . For Poisson and quasi-Poisson distributions, g ( µ 0i ) 2 Var(y i ) = 1/ µ 0i . For NB distributions, g ( µ 0i ) 2 Var y i = (1/ µ 0i ) + 1/ θ . Since P P = P and = V −1 D, the asymptotic distribution can be simplified as MVN n (0, −1 P −1 ). Therefore, under H 0 , T, a quadratic form of e, asymptotically follows a mixture Chisquare distribution p i=1 ξ i χ 2 1,i , where χ 2 1,i are independent Chi-square distributions with 1 degree of freedom, and ξ i are the eigenvalues of E = G T B PG B . Notably, in P has a simple structure which makes −1 to be solved explicitly and E to be calculated efficiently. The p-value of the test can be calculated soon after the estimation of the reduced null model. More details of the computational efficiency of scMMSTs are discussed in section "Performance Evaluation". The estimation procedure of µ 0i is the same for binomial and quasi-binomial distribution pair and the Poisson and quasi-Poisson distribution pair. Thus, we implement quasi distributions to allow flexibility. In the followings, unless specified otherwise, "binomial" stands for both binomial and quasi-binomial and "Poisson" stands for both Poisson and quasi-Poisson.
There is zero inflation in scRNA-seq count data. Therefore, following the idea of ZINB-WaVE, a weighting strategy is implemented. Firstly, observational weights are calculated for all counts independently with details shown in sections "Zero-Inflated and Zero-Truncated Distributions for Counts" and "Calculations of Observational Weights for scMMSTs." Afterward, counts data with calculated weights are analyzed under the weighted GLMMs. Accordingly, a weighted version test statistic T w for scMMSTs is proposed as follows with above notations.
where W = diag {w i } and w i is the given weights for count y i . The estimation is based on the weighted GLLMs for the reduced null model. We denote 1 w,n = W w . Based on the theoretical results of weighted GLMMs (Harville, 1977;Breslow and Clayton, 1993;Santos Nobre and da Motta Singer, 2007;Chen et al., 2016), if H 0 and W are true, we have e asymptotically normally distributed as MVN n (0, D −1 V w P w If H 0 and W are true, T w , a quadratic form of e, asymptotically follows a mixture Chi-square distribution p i=1 ξ i χ 2 1,i , where χ 2 1,i are independent Chi-square distributions with 1 degree of freedom, and ξ i are the eigenvalues of E w = G T B P w G B . Note that w in P w does not have the simple structure of , which makes it hard to analytically and explicitly solve −1 w . Therefore, we propose E w = G T B W PWG B to approximate E w for simplicity and efficiency, where we treat e as it is estimated by GLMMs without weights. Calculated weights are 1 for nonzero counts and between 0 and 1 for zero counts. Thus, this approximation performs worse when there are more redundant zeros, which might influence the performance of scMMSTs.

Zero-Inflated Distributions for Counts
A zero-inflated distribution for counts is a mixture distribution with two components, which are a point mass at zero and a conventional random variable distribution for counts, e.g., Poisson and NB distributions. The probability mass function (pmf) of a zero-inflated distribution for counts is as follows.
where π ∈ [0, 1] indicates the probability of zero inflation, δ 0 (·) the Dirac function, f (·; θ) the pmf of a conventional distribution with parameter vector θ. The observational weights of the counts can be calculated under a zero-inflated distribution model as the conditional probability that a given count y belongs to the conventional distribution with parameter estimates θ, π: Note that w is 1 for nonzero counts and ∈ (0, 1) for zeros counts. All the weights for counts under the conventional distribution are 1. Under a zero-inflated distribution, we take the weights of nonzero counts remain 1 and downweight zero counts from 1 to the conditional probability that a given count y belongs to the conventional distribution. Counts with observational weights are subsequently analyzed under the weighted version of models for the conventional distribution. In ZINB-WaVE, this weighting strategy is applied and the above formula is applied to calculate observational weights under the ZiNB distribution (Van den Berge et al., 2018).

Zero-Truncated Distributions for Counts
A zero-truncated distribution for counts is a distribution for counts with random variable values truncated at zero, i.e., only counts larger than zero can be observed. In the followings, we refer to zero-truncated distributions as truncated distributions for short. The pmf of a truncated distribution for counts is as follows.
where f (·; θ) denotes the pmf of a conventional distribution for counts with parameter vectorθ. The observational weights of nonzero counts are 1 and weights of zero counts can be calculated under a truncated distribution model as following: where n 1 is the number of nonzero counts, n 0 is the number of the zero counts in the whole sample and θ is the parameter vector estimate. The derivation of the above formula is as follows. Nonzero counts follow the truncated distribution with parameter θ which is the also the parameter for the corresponding conventional distribution. Therefore, the probability of zero counts is estimated as f y = 0; θ . All the weights for counts under the conventional distribution are 1. However, since excess zeros are presented, the observational weights of nonzero counts remain 1 and zero counts are reweighted from 1 to w, so that w·n 0 w·n 0 +1·n 1 = f y = 0; θ . The resulting formula for observational weights w is derived by solving the equation. Counts are then analyzed with observational weights calculated under the weighted version of models for the conventional distribution.

Calculations of Observational Weights for scMMSTs
In ZINB-WaVE, the weighting strategy shown in the previous section is applied and observational weights are estimated by the ZiNB regression (Van den Berge et al., 2018). For our methods, the truncated Poisson (TrPois), zero-inflated Poisson (ZiPois), truncated negative binomial (TrNB), and ZiNB distributions are considered. Following the weighting strategy mentioned and H 0 : β = 0, we estimate parameters for counts in each batch and calculate the weights accordingly using the formulas in section "Zero-Inflated and Zero-Truncated Distributions for Counts" for simplicity with the assumption of no group effects.
For zero-inflated distributions, weights are the conditional probabilities that a count y belongs to the corresponding conventional distribution. We directly use ZINB-WaVE for the ZiNB distribution, and implement the algorithm in Appendix A of the paper (Böhning et al., 1999) for the ZiPois distribution. In ZINB-WaVE, no mixed models are involved. Thus, we treat batch effects as fixed effects in the ZiNB regression without group effects to calculate weights using all counts data, when using ZINB-WaVE. For TrPois distribution, since the pmf f TrPois y = f Pois( y) 1−e −λ = λ y e −λ y!1−e −λ , we can derive the method of moment estimate and maximum likelihood estimate λ and they are identical by numerically solve the equation λ 1−e − λ = y, where y is the sample mean for the truncated sample. For each batch, the weights are w i = n 1 e − λ n 0 1−e − λ for a zero count and w i = 1 for nonzero y i , where n 1 is truncated sample size for the batch and n 0 is the number of the zero counts in the batch. TrPois and ZiPois perform very close to each other. For TrNB distribution, we implement the formulas in section "Results" of the paper (Rider, 1955) to estimate the mean parameter µ and the dispersion parameter θ for each batch. The common dispersion parameter θ is estimated by the harmonic mean of the estimated θ for each batch. However, this algorithm is not robust for small θ (θ < 2, based on simulations). The weights are w i = n 1 θ/ θ+ µ θ n 0 1− θ/ θ+ µ θ for zero counts in each batch, where θ and µ are respectively the estimated dispersion and mean parameters for the NB distribution using counts in the batch, and w i = 1 for nonzero y i for each corresponding batch.
After weights are calculated, counts data with weights are analyzed under weighted GLMMs shown in section "Single Cell Mixed Model Score Tests." Note that weights are calculated independently of GLMMs. Theoretically, the weights are 1 under conventional distributions. The calculated observational weights for nonzero counts remain 1. If there are calculated weights of zero counts far from 1 and closer to 0, it indicates that there are excess zeros. If calculated weights of zero counts are close to 1, the results for conventional distributions are similar to those considering zero inflation. In ZiNB-Wave, weights are calculated through the ZiNB regressions on all counts. However, the weights for TrPois, ZiPois, and TrNB are calculated using counts for each batch with smaller sample sizes. Therefore, although the calculation of weights for TrPois, ZiPois and TrNB is easier to implement and time saving, it is less accurate and less reliable than that for ZiNB-Wave and the performances of scMMSTs are affected.

Performance Evaluation
Performances of DE methods considered are assessed in terms of the per-comparison error rate (PCER), which refers to type I error rate (i.e., the proportion of false positives), line plots of the true positive rate (TPR) vs. the false discovery proportion (FDP) and the areas under the receiver operating characteristic (ROC) curves [i.e., the TPR vs. the false positive rate (FPR) curves] (AUCs) with definitions as follows.
where we use the following abbreviations for empirical quantities: FP (the number of false positives), TP (the number of true positives), N (the number of negative samples), P (the number of positive samples). FDP-TPR curves for adjusted p-values are plotted by iCOBRA Bioconductor R package (version 1.12.1) (Soneson and Robinson, 2016) and AUCs for adjusted p-values are calculated by pROC R package (version 1.16.2) (Robin et al., 2011). Unless otherwise stated, the adjusted p-values for all DE methods considered are calculated by the Benjamini and Hochberg method (Benjamini and Hochberg, 1995) for FDR control.

Comparison Methods
The 12 methods considered for comparisons are Poisson, TrPois, ZiPois, NB, TrNB, NB-zinb, DESeq2, DESeq2-zinb, edgeR, edgeR-zinb, limma-voom, and MAST. The first six methods are our implemented methods of scMMSTs s under GLMMs assumptions and the last six methods are the state-of-the-art DE methods, where Tr, Zi, Pois, NB, and zinb are abbreviations of truncated, zero-inflated, Poisson, ZINB-WaVE, respectively. We follow the implementations of the last six DE methods above in the zinbwave paper (Van den Berge et al., 2018) and the R packages used are edgeR (version 3.28.1), DESeq2 (version 1.26.0), limma (version 3.42.2), MAST (version 1.12.0), and zinbwave (version 1.8.0), which was developed to deal with zero inflation for scRNA-seq data by a weighting strategy and was used in edgeR-zinb, DESeq2-zinb, and NB-zinb. The binomial distribution scMMST is implemented, however, not covered in the simulations and real data analysis since only methods for count data are considered in thisarticle.
The implementations of scMMSTs are available in Supplementary Data S1. Codes for simulations and real data analysis are partially based on the GitHub repositories 12 of papers (Yang et al., 2017;Van den Berge et al., 2018)  Simulated single cell datasets are generated by splatter R package (version 1.10.1) (Zappia et al., 2017). Additionally, the code to reproduce all analyses, figures and tables reported in this manuscript is attached in Supplementary Data S1.

Simulations
We perform simulations to evaluate performances of scMMSTs, which are our methods of association tests under the proposed GLMMs, comparing with state-of-art DE methods under a range of scenarios. We simulate the scRNA-seq data based on GLMMs directly and by the R package splatter. Splatter can directly estimate model parameters for real scRNA-seq data and generate quality controlled simulated mock datasets with DE genes easily and can add batch effects, which are not associated with group effects, to the simulated data. The simulated number of genes for one dataset by splatter and GLMMs is 10,000 and the number of cells is 250 with balanced two groups and five batches. In the DE genes simulations, the proportion of the DE genes is set to be 0.1.
Additional parameters of splatter simulations, batch.facLocbatch factor location, batch.facScale-batch factor scale, and out.prob-the expression outlier probability, are set to be 0.5. For DE gene simulations, de.facLoc, DE factor location, is set to 2 and de.facScale, DE factor scale, is set to be 0.5.
The procedure to simulate datasets based on the proposed GLMMs is as follows. We assume that the scRNA-seq count data follow Poisson and NB distributions and generate y i based on the GLMM shown with the parameters setting and generate a Bernoulli random variable z i with parameter π i = logit −1 (µ π + B i b). Larger values of parameter µ π causes smaller baseline proportions of zeros. If z i = 0, then y i = 0, and y i remains the same otherwise. The parameter settings for simulations are based on the real data analysis and references (Yang et al., 2017). Seven parameters are considered: the variance of the batch or individual effects b(σ 2 b ), the variance of the group or cell type effects β(σ 2 β ), the baseline group effect (β 0 ), the number of batches (p), the dispersion parameter (θ = 1/φ) for NB distributions and the intercepts (µ 0 ) and (µ π ) for the GLMM and logstic regression for excess zeros, respectively. σ 2 b shows the heterogeneity of batch effects in different batches. σ 2 β shows the heterogeneity of group effects in different batches. β 0 shows the baseline group effect. The larger the |β 0 |, the larger the baseline group effect is. Other parameters describe the features of the gene expression and zero inflation. σ 2 b is set to be 0.25 and σ 2 β varies in 0, 0.01, 0.25, and 1. β 0 varies in 0, 0.01, 0.1, 0.3, and 0.5. θ varies in 0.5, 1, and 2. µ π varies in −1, 0, and 2. p = 5 and µ 0 = 5.

Usoskin Dataset
This scRNA-seq dataset contains mouse neuronal cells in the dorsal root ganglion (Usoskin et al., 2015). The processed expression values were downloaded from the Github respiratory 3 of the zinbwave paper. Following the process procedures given in the zinbwave paper, the authors considered 622 cells with a classification of 11 neuronal cell-types, which were denoted as NF1 to NF5, NP1 to NP3, PEP1, PEP2 and TH. Genes with less than 20 counts were removed and a total of 12,132 genes are considered for the following analyses with 68% zero counts. The authors showed the existence of a batch effect related to the picking session for the cells. Thus, the picking session covariate (with values Cold, RT-1, and RT-2) in this dataset was considered as a batch covariate for real data analysis. The batch effect was associated with expression measures and the relationship between zero inflation and sequencing depth, which was shown in Figure 5 of the zinbwave paper (Hicks et al., 2015;Van den Berge et al., 2018). We repeated the results of Figures 5A,B of the zinbwave paper in Supplementary Figures S1A,B. There is a large variation in the depth of sequencing among batches, which weaken the overall association with zero inflation when pooling cells across batches (Supplementary Figure S1A). Zero inflation was also identified for the Usoskin dataset. Histograms of observational weights for nonzero counts, which were calculated by the ZINB-WaVE model including the cell type as a covariate with and without the batch effect as fixed effects, are shown in Supplementary Figure S1B. Calculated weights of nonzero counts with and without the batch effect both have high modes near zero. This suggests zero inflation in the Usoskin dataset. The real data analysis of the processed Usoskin dataset was done to identify DE genes for defined 11 cell types vs. the rest. Simulated datasets based on this dataset were generated by spaltter with estimated corresponding parameters. For a null dataset without DE genes, we created 10,000 genes, 250 cells, five balanced batches and two balanced groups for cells. Twelve methods were implemented to identify DE genes between the two groups for each of the 30 simulated null data sets. A gene was declared to be DE if its unadjusted p-value was less than or equal to 0.05. Declared DE genes were false positives for these simulated null datasets. The empirical PCER of each method was calculated as the proportion of declared DE genes and was compared to the 0.05 nominal PCER.

Tung Dataset
This scRNA-seq dataset is for induced pluripotent stem cells from three individuals from HapMap (Tung et al., 2017). Following the splatter paper (Zappia et al., 2017), the matrix of molecules (UMIs) was treated as counts and was used directly. This dataset is available from GEO (accession GSE77288) 4 and the Github respiratory 5 of the splatter paper. No batch information is available for this dataset. Genes with less than 20 counts were removed and a total of 14,893 genes with 864 cells containing 44% zero counts were considered. Zero inflation was identified for the Tung dataset. Histograms of observational weights of nonzero counts of two filtered datasets (18,726 genes with more than 0 count and 14,893 genes with more than 19 counts, respectively), which were calculated by the ZINB-WaVE model, are shown in Supplementary Figures S1C,D. There are moderate proportion s of calculated weights of nonzero counts close to zero. This suggests zero inflation in the Tung dataset. Comparing to the Usoskin dataset, the Tung dataset is less zero inflated. We generated 30 simulated null datasets and identified DE genes using the same procedures for the Usoskin dataset with spaltter.

Method Overview
Single cell mixed model score tests are computationally efficient DE analysis tools for scRNA-seq data considering batch effects and zero inflation. Bath effects are estimated as random effects under the reduced null models of GLMMs. A weighting strategy is implemented to characterize excess zeros. The score statistics are derived on theoretical asymptotic distributions. First, we estimated normalization factors of count matrix by the function calcNormFactors in edgeR after counts per million (CPM) normalization. Second, the estimation of the observational weights is efficient. We use zinbwave to fit NBzinb which might be the most time-consumed assumption. Third, we use lme4 for the estimation, the most efficient method to fit GLMM, to estimation the parameters in the null hypothesis (Eddelbuettel and François, 2011;Eddelbuettel, 2013;Eddelbuettel and Balamuta, 2017). Considering the real data, the estimation procedure of mixed model is not related to the number of groups or cell types. Compared to the traditional estimation procedure, scMMSTs use three strategies to decrease memory usage and computation time. First, scMMSTs do not need to store n × n matrices P and explicitly. The p-value is efficiently calculated by CompQuadForm with eigenvalues of E or E w , which is only a p × p matrix. Second, scMMSTs use an analytical form to calculate the inverse of which might be the most time consumption procedure in the estimation of T or T w . Third, scMMSTs is implemented for parallel computing. Therefore, although more complicated models GLMMs are considered, scMMSTs are computationally affordable compared to other DE methods.

Simulations by Real Datasets and Splatter
Simulated datasets generated by the splatter used parameters estimated from two publicly available real scRNA-seq datasets, the Usoskin (Usoskin et al., 2015) and Tung (Tung et al., 2017) datasets.
The FPR control was assessed by the PCER. Results are shown in Figure 1. For the Usoskin dataset, the estimated common dispersion parameter value of biological coefficient of variation (BCV) wasφ = 1/ θ =1.89. TrNB and Poisson failed to control the FPR. The PCERs of NB-zinb, DESeq2, edgeR-zinb, and edgeR were a little inflated. DESeq2-zinb and MAST controlled the FPRs with large variability, especially for MAST. Other methods were a little conservative with PCERs smaller than the nominal level 0.05. For the Tung dataset, the estimated common dispersion parameter value of BCV wasφ = 1/ θ = 0.11. Poisson failed to control the FPR. The PCERs of TrNB and edgeR were a little inflated. Other methods conservatively controlled FPRs, especially for NB, DESeq2-zinb, and NB-zinb. We treated "NA" p-values of DE methods as 1, thus, there are peak bars at 1 for some methods in the unadjusted p-value histograms shown in Figures 1B,D. In summary, standard DE methods can control the FPRs and scMMSTs except Poisson and TrNB can conservatively control the FPRs. FPRs of scMMSTs increase as the dispersion parameter θ decreases.
False discovery proportion-true positive rate curves for adjusted p-values are shown in Figure 2. For the Usoskin dataset, bulk RNA-seq DE methods are shown to perform well, possibly due to the high proportion of zeros and low counts (Van den Berge et al., 2018). In general, standard DE methods except MAST perform better than scMMSTs when the batch effects is not associated with group effects.

Simulations by GLMMs
Results of PCERs are shown in Supplementary Figures S2,  S3 and Supplementary Table S1. Methods performances of the FPR control were similar to those in simulations by splatter. Based on FDP-TPR curves for adjusted p-values shown in Figure 3, scMMSTs performed better than standard DE methods when batch effects were associated with weak group effects. NB-zinb was the best among all methods considered for comparisons. EdgeR-zinb and DESeq2-zinb were the best two methods among the six standard DE methods considered. TrPois and ZiPois perform very close to each other. Figure 4 demonstrates bar plots of AUCs for adjusted p-values. |β 0 |, σ 2 β , θ and µ π exhibited positive correlations with AUCs. Our scMMSTs performed better when the group effect size and its heterogeneity are larger and the counts dispersion BCV and proportion of zeros are smaller. Similar results are obtained to those of FDP-TPR curves. Therefore, our results demonstrate that scMMSTs performs better than standard DE methods when the group effect size is small with large group effect heterogeneity. Table 1 and Supplementary Figure S4 show the numbers of DE genes detected by the 12 methods considered in simulations for 11 cell types in the Usoskin dataset. This dataset was also analyzed in the zinbwave paper. MAST failed for some celltypes, so no DE gene was detected. NB-zinb defined smallest number of DE genes in general. The results of Venn diagrams and Upset plots by R packages VennDiagram (version 1.6.20) (Chen, 2018) and upsetR (version 1.4.0) (Gehlenborg, 2019) are shown in Supplementary Figures S5-S15. Since NB-zinb is conservative for FDR, the DE genes only detected by NB-zinb highly likely have weak group effects with their heterogeneity across batches. In general, scMMSTs, as supplement to standard methods, are superior at selecting DE genes with weak group effects and their heterogeneity in different batches for scRNAseq data.

Methods
FIGURE 5 | Computational times for differential expression methods on the simulated null Usoskin and Tung datasets, which were generated by splatter.
The number of cores were set to be 1 and 8 on a cluster with 24 Intel Xeon Processor (Skylake, IBRS) at 2.60 GHz (2593 MHz) and 128 GB RAM.

DISCUSSION
We proposed scMMSTs to identify DE genes, considering batch effect and zero inflation of scRNA-seq data. Both simulations and real data indicated that these methods have advantages in selecting DE genes with weak group effects and their heterogeneity in different batches. In simulations, scMMSTs conservatively controlled FPRs or type I error rates in each setting under assumptions of NB and Poisson distributions, except TrNB and Poisson assumption. However, TrNB controlled FPRs when θ is large. Second, following the model assumption, scMMST was the best one when |β 0 | was small and σ 2 β was large, especially when θ was large. In real data analysis, the Venn diagrams and Upset plots of DE genes (Supplementary Figures S5-S15) directly indicated the relationships among the DE methods. scMMATs defined smaller numbers of DE genes and NB-zinb defined the smallest. Since scMMATs are conservative, the DE genes only defined by NB-zinb are likely to have the small group effect size with its heterogeneity across batches.
Furthermore, scMMSTs exhibited three innovations. First, scMMSTs derived the association test score statistics and their theoretical null distributions in the framework of GLMMs under the binomial, Poisson and NB assumptions. Second, the group effect β was modeled as random effects associated with batches in the framework of GLMMs. Third, scMMSTs verified their effectiveness to detect DE genes with the weak group effect and its heterogeneity in different batches. However, scMMSTs have some limitations. scMMSTs performed worse than other standard DE methods to detect DE genes without group effect heterogeneity across batches. scMMSTs performed worse when the dispersion parameter θ was small, especially for the TrNB method, this may due to the non-robust estimation of θ. scMMSTs, in fact, are derived to test H 0 under the assumption β 0 = 0, not to jointly test β 0 = 0 and σ 2 β = 0. This decreases the power of testing H 0 for scMMSTs. For association tests, the Mixed effects Score Test (MiST), which jointly tests H 0 , is more powerful. Therefore, scMMSTs may be extended using the framework of GLMM-MiST (Sun et al., 2013) in future work to overcome these drawbacks. E w is used to approximate E w for the statistic T w of scMMSTs. This approximation performs worse when there are more excess zeros. Better approximations of E w or methods to efficiently calculate E w may improve the performance of scMMSTs. The weighting strategy implemented may be explained in a Bayesian framework and scMMSTs may be extended accordingly. In addition, following the idea of PEA (Shao et al., 2019), scMMSTs may be extended to efficiently identify gene-pathway interactions without permutations of test statistics. In conclusion, scMMSTs, supplements to standard single cell DE methods, are advantageous at selecting genes with the weak group effect and its heterogeneity across batches for scRNA-seq data analysis.

AUTHOR CONTRIBUTIONS
FS and HW conceived and supervised the study. ZH and FS implemented the software, conducted the simulations, analyzed the data, and wrote the manuscript. ZH and YP prepared figures and tables. ZH, YP, HW, and FS modified and reviewed the manuscript. All authors contributed to the article and approved the submitted version.