In Papyro Comparison of TMM (edgeR), RLE (DESeq2), and MRN Normalization Methods for a Simple Two-Conditions-Without-Replicates RNA-Seq Experimental Design

In the past 5 years, RNA-Seq has become a powerful tool in transcriptome analysis even though computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. It is, however, now commonly accepted that the choice of a normalization procedure is an important step in such a process, for example in differential gene expression analysis. The present article highlights the similarities between three normalization methods: TMM from edgeR R package, RLE from DESeq2 R package, and MRN. Both TMM and DESeq2 are widely used for differential gene expression analysis. This paper introduces properties that show when these three methods will give exactly the same results. These properties are proven mathematically and illustrated by performing in silico calculations on a given RNA-Seq data set.


INTRODUCTION
In the past 5 years, RNA-Seq approaches, based on high-throughput sequencing technologies, are becoming an essential tool in transcriptomics studies (cf. Wang et al., 2009). It is now commonly accepted that a normalization preprocessing step can significantly improve the quality of the analysis, in particular, for the differential gene expression analysis (cf. Bullard et al., 2010). Nevertheless, a gold standard normalization method has not yet been found. This paper deals with two widely used and very important normalization methods and a third method related to these. The first method is the "Trimmed Mean of M-values" normalization (TMM) described in Robinson and Oshlack (2010) and implemented in the edgeR package (cf. . The second method is the "Relative Log Expression" normalization (RLE) implemented in the DESeq2 package (cf. Anders and Huber, 2010;Anders et al., 2013;Love et al., 2014). The third method is the "Median Ratio Normalization" (MRN) described in Maza et al. (2013). It has been shown that TMM and RLE give similar results both with real and simulated data sets (cf. Dillies et al., 2013;Maza et al., 2013;Rapaport et al., 2013;Li et al., 2015;Reddy, 2015). These two methods, as does MRN, deal efficiently with the intrinsic bias resulting from the relative size of studied transcriptomes. Also, it has even been shown that the MRN method performs slightly better on some simulated data sets (cf. Maza et al., 2013). Moreover, many studies have shown that LRE and/or TMM methods outperform other particular methods (cf. Dillies et al., 2013;Maza et al., 2013;Reddy, 2015;Zyprych-Walczak et al., 2015;Lin et al., 2016). Nevertheless, a comprehensive comparison study of differential expression analysis methods has used LRE or TMM for ten of the eleven compared tools (cf. Soneson and Delorenzi, 2013). Finally, other more sophisticated normalization methods have been carried out by iterating one of LRE or TMM methods (cf. Kadota et al., 2012;Sun et al., 2013;Tang et al., 2015).
In this paper, all theoretical results will be illustrated by in silico calculations carried out on a given real data set from the tomato fruit set (see Materials and Methods). In short, this data set consists of a matrix of counts: 34675 rows (genes) and 9 columns (samples from 3 stages and 3 biological replicates per stage). Normalization factors of these fruit set samples, obtained by each of the TMM, RLE, and MRN methods with default settings, are presented in Table 1. Figure 1 represents the scatter plot of obtained normalization factors and corresponding library sizes. Moreover, Figure 1 contains, for all three normalization methods, the regression lines estimated from a simple linear regression modeling the relationship between default normalization factors and library sizes. It is evident in both Table 1 and Figure 1 that the three methods (with default settings) do not give the same results. Indeed, it is known that TMM normalization factors do not take into account library sizes. This fact is illustrated in Figure 1 by an almost horizontal regression line. On the contrary, RLE and MRN factors are closer to each other, and share a positive correlation with the library size. The estimation of the regression parameters of regression lines above shows that the TMM slope is not statistically significant (at 5% type I error) which is the case of both LRE and MRN slopes (see Additional file 1).
The aim of this study is to provide a deeper understanding as to why the three normalization methods quoted above share a similar normalization approach. This paper also demonstrates that, in some cases, some shared parameters (such as relative size of transcriptomes or normalization factors) are strictly equal.

Tomato's RNA-Seq Data Set
To investigate the tomato transcriptome dynamics of fruit set, RNA were isolated from flower buds (Bud) and flowers at anthesis (Ant) and post-anthesis (Pos) stages. For each stage, cDNA libraries were generated from three biological replicates and subjected to Illumina mRNA-Seq technology sequencing. Then, after mapping reads to the tomato genome sequence, we obtained a table of raw counts with 34675 rows (genes) and 9 columns (3 stages and 3 replicates per stage). These technical procedures are described in Maza et al. (2013). In this paper, for sake of simplicity, the matrix (34675 × 9) containing raw counts is denoted by X.

Computations with R Packages
All computations were done within R environment (cf. R Development Core Team, 2011). All packages are available from R or Bioconductor websites (cf. Gentleman et al., 2004). As described above, the matrix containing raw counts is denoted by X in all R command lines of given in silico examples.
The TMM normalization method is implemented in the edgeR package by means of the calcNormFactors function. For example, the default normalization factors obtained in Table 1 are obtained by the following command line: > calcNormFactors(X) The RLE normalization method is implemented in the DESeq2 package by means of the function estimateSizeFactorsForMatrix. For example, the   Table 1 are obtained using the following command line: The MRN normalization method is implemented in a homemade function called mrnFactors which is provided as an additional file (see Additional Files). For example, the default normalization factors obtained in Table 1 are obtained using the following command line: > mrnFactors(X, rep(1:3 each=3))$normFactors

Definitions of Some Important Terms
We define hereafter some important terms that are used in the studied normalization methods. More detailed definitions are given in Robinson and Oshlack (2010), Anders and Huber (2010), and Maza et al. (2013).
Both M and A-values are defined for a given gene g and its expressions on two conditions X g1 and X g2 . They represent respectively the fold change and the absolute expression level of the gene: A trimmed mean of M (respectively A)-values corresponds to the calculation of the mean after discarding a given proportion of lower and higher values. A trimmed mean with a proportion of 50% corresponding obviously to the calculation of the median.

RESULTS AND DISCUSSION
In this section, the three studied normalization methods are first described and compared. Then, three propositions are introduced and in silico results are provided to illustrate them. For the sake of clarity, mathematical proofs of these propositions are left to the end of the section. We have to underline that we focus, in this paper, on so called "scaling normalization methods" but this is just one approach, which can be limited to some specific experimental situations (cf. Maza et al., 2013). Another alternative can be the use of control genes (see, for instance, Risso et al., 2014).
We notice here that, in order to be consistent, the first paragraph below (named "Notations and Experimental design") reproduces information that have been already reported in detail in Maza et al. (2013).

Notations and Experimental Design
Let X gkr be the observed number of reads (or count) of gene g ∈ {1, . . . , G} in condition k ∈ {1, . . . , K} for biological replicate r ∈ {1, . . . , R}. For the sake of simplicity, we deal with the same number of replicates for each given condition, but the following propositions are not altered by a more general design. Let µ gk be the expectation of the true and unknown number of transcripts of a given cell for gene g in condition k; L g the length of gene g; and N kr the total number of reads in condition k for replicate r (or library size). As described in Robinson and Oshlack (2010), we can model the expected value of X gkr as where S k is the size of studied transcriptome in condition k, that is Then, for each gene g, an approximation of the expected value of the ratio between two conditions, say 1 and 2, is given by We can easily see in the equation above that the ratio of interest, in a differential analysis point of view, i.e., µ g2 µ g1 , is not directly measured by ratios of raw counts because of library sizes and relative sizes of transcriptomes. As described in Maza et al. (2013), the three methods studied here aim at removing such biases. Table 2 gives us the description of these three normalization methods. Both TMM and RLE are respectively implemented in edgeR and DESeq2 packages (see Materials and Methods). Differences and commonalities of these three methods are described here, step by step.

Description of the Three Methods
Step I Both TMM and MRN methods have a step of prenormalization of counts by library sizes. The RLE method does not and works directly on raw counts.
Step II The three methods have a reference sample. For TMM, the reference sample has been chosen here arbitrarily with k = 1 and r = 1. The default for the edgeR package consists in choosing the library whose upper quartile is closest to the mean upper quartile for all the libraries. For MRN, the condition k = 1 is also arbitrarily chosen, as in Maza et al. (2013). Moreover, for MRN, by definition, all replicates of the chosen condition are used. For RLE, a geometric mean of all sample values is performed.
Step III For all three methods, relative sizes of transcriptomes and the reference sample are based on ratios of counts (or pre-normalized counts for TMM and MRN) and the reference sample. For TMM, the set G * kr represents the set of not trimmed genes with valid M and A-values (cf. Robinson and Oshlack, 2010). By default, with the calcNormFactors function of the edgeR package, percentages of trimmed M and A-values are respectively of 30 and 5% (see Materials and Methods). In order to simplify, but still staying in the same vein, the relative scaling factor calculations are here described with an unweighted trimmed mean instead of the weighted one which is proposed by default on edgeR.
III Relative sizes of transcriptomes and reference sample, or relative scaling factors (edgeR), or size factors (DESeq2) where G * kr represents the set of not trimmed genes Relative scaling factors adjusted to multiply to 1 (edgeR)τ Step IV In this step, only the TMM method performs an adjustment of its relative scaling factors to multiply to 1.
Step V Only TMM and MRN methods take explicitly into account both relative scaling factors and library sizes.
Step VI In this step, it is clear that TMM normalization factors (as produced by the calcNormFactors function) do not take into account the library sizes but only the relative scaling factors. That explains the absence of correlation between normalization factors and library sizes in Figure 1 (see Introduction). In edgeR, these normalization factors are used as offset parameters in the statistical model for differential gene expression analysis. Once again, we underline here that values obtained in Table 1 are not estimations of the same theoretical parameters, and thus, these values can not be used in the same way, for instance, to normalize counts.
Step VII For normalization of counts, all three normalization methods take into account both relative sizes of transcriptomes and library sizes. Only the TMM method gives counts-per-million (CPM). Obviously, CPM values can easily be obtained for RLE and MRN methods by multiplying normalized counts by 10 6 .

Properties of the Three Methods
After the above detailed descriptions of our three methods, we introduce below three properties showing particular cases where all three methods give the same result.
Proposition 1 (concerning TMM and MRN). Let's assume that the following assumptions hold for the calculus of the TMM normalization method: • The reference sample is (arbitrarily) the first one (k = 1 and r = 1). • Trimming parameters of M and A-values are respectively equal to 50 and 0%. • Calculation of the trimmed mean is done without computing weights (unweighted mean).
Moreover, let R = 1 (no replicates). Then, the relative scaling factors of TMM and MRN methods are equal: An example illustrating Proposition 1 is given in Table 3. Calculations are carried out by means of R functions calNormFactors (from the edgeR package) for the TMM method and mrnFactors (see Materials and Methods) for the MRN method, as follows: > calcNormFactors(X, refColumn=1, logratioTrim=0.499, sumTrim=0, doWeighting=FALSE) > mrnFactors(X, 1:9)$medianRatios We can clearly see in Table 3 that, with function arguments corresponding to the assumptions of Proposition 1, the adjusted relative scaling factors produced by TMM and MRN methods are equal up to the third or fourth decimal place for almost all of the values (only one of the values has just two decimal places equal). This slight difference is due to the logratioTrim argument that cannot be strictly equal to 50%. Proposition 2 (concerning RLE and MRN). Let K = 2 conditions and no replicates (R = 1). Then, the size factors calculated from the RLE and MRN methods are equal: We illustrate Proposition 2 by calculating the size factors for some pairs of samples (see Table 4).

))$normFactors
We can see in Table 4 that, as introduced in Proposition 2, normalization results from both methods are equal. We must note here that, with K > 2, Proposition 2 does not hold: with more than two samples, the reference sample of the RLE method takes into account all raw counts and this is obviously not the case for the MRN method. This can be checked by straightforward calculations with more than two samples.
Proposition 3 (concerning RLE, TMM, and MRN). Let's assume that the assumptions of both Proposition 1 and Proposition 2 are satisfied. Then, normalized counts of the RLE and MRN methods, and counts-per-million of the TMM method (up to a constant) are equal:

Calculation of RLE Normalization Factors with edgeR
We note here that the calcNormFactors function contains an argument called "method" that can also be used to calculate RLE normalization factors (cf. the description of the calcNormFactors function in the edgeR package). The user should however be careful! Indeed, as calculated below, calcNormFactors with method="RLE" and estimateSizeFactorsForMatrix functions do not give the same results. What happens is that the calcNormFactors function does not work with raw counts but with pre-normalized ones (see Step I of Table 2). Moreover, the calcNormFactors function proceeds to an adjustment of values (see Step IV of Table 2). In order to find the same values with both functions, we have to proceed as follows: > calcNormFactors(X, method="RLE") > estimateSizeFactorsForMatrix(X) > f=estimateSizeFactorsForMatrix(X% * %diag (1/colSums(X))) > f/prod(f)ˆ(1/length(f))

Proofs of Propositions
Proof of Proposition 1. We first note that, with R = 1, i.e., with no biological replicates, the reference samples of both TMM and MRN methods are equal: Moreover, if we assume that (i) the trimmed mean of the TMM method is done with an unweighted mean as described in the Step III of the edgeR method in Table 2, and that (ii) the trimming values are equal to 50% of genes with upper Mvalues and 50% of genes with lower M-values, then we obtain that Proof of Proposition 2. Let's first describe the RLE method calculations by following steps of Table 2. For K = 2 and R = 1, the pseudo-reference sample is the following: Then, we directly have that Let's then describe calculations for the MRN method. For K = 2 and R = 1, the reference sample is simply the first sample: Then, the relative sizes are the following: And the proposition is proved.

CONCLUSIONS AND FURTHER WORK
This paper focus on two widely used normalization methods for RNA-Seq data and a third method related to these, that seem to give similar results and outperform many other classical methods if we consider all references given in the Introduction. Better understanding these methods is then an important issue dealt by this paper. We highlight in this paper that the three considered normalization methods deal with similar underlying ideas. Moreover, we prove that these methods give exactly the same result in some simple experimental designs. For instance, Proposition 3 shows that for two given samples, normalized counts are (up to a constant) equal.
It has also been shown in this paper that the user should carefully use and not mix these normalization methods and R packages as all concepts are not equal. For instance, the so called "normalization factors" from edgeR and "size factors" from DESeq2 are not the same theoretical parameters.
Nevertheless, it has been shown in Maza et al. (2013) that the MRN method performs slightly better on some simulated data sets with a standard experimental design of two conditions with replicates. The present paper does not explain why it performs better but attempts to give some hypotheses, inspired by the proved propositions, by focusing on what differ between the three normalization methods. We give hereafter some of these hypotheses. (i) For instance, the reference sample of the MRN method, as a mean of all replicates of a given condition, is a more robust estimation of mean counts in a given condition (more robust than the TMM method). (ii) Also, for the TMM method, the trimming parameter of M-values should perhaps (by default) be chosen around 50% in order to have a more robust estimation of the relative size of transcriptomes. (iii) Moreover, in the same way, for the MRN method, the relative sizes of transcriptomes are not sample-specific but condition-specific. Indeed, for the MRN method, these relative sizes are the same for all replicates of a condition. This should perhaps give a more robust estimation than for TMM and RLE relative sizes. All these hypotheses, among others, should be explored in forthcoming work.
Finally, we conclude here that for a very simple experimental design, i.e., about two conditions and no replicates, users can use any of the three studied normalization methods with no impact on results. But, for a more complex experimental design, the results described in Maza et al. (2013) tend to indicate that the MRN method could be adopted. However, obviously, this last hypothesis should be proved rigorously in further work.

AUTHOR CONTRIBUTIONS
EM has carried out the calculations, performed the analysis, and written the paper.