Incorporating phylogenetic information in microbiome abundance studies has no effect on detection power and FDR control

We consider the problem of incorporating evolutionary information (e.g. taxonomic or phylogenic trees) in the context of metagenomics differential analysis. Recent results published in the literature propose different ways to leverage the tree structure to increase the detection rate of differentially abundant taxa. Here, we propose instead to use a different hierachical structure, in the form of a correlation-based tree, as it may capture the structure of the data better than the phylogeny. We first show that the correlation tree and the phylogeny are significantly different before turning to the impact of tree choice on detection rates. Using synthetic data, we show that the tree does have an impact: smoothing p-values according to the phylogeny leads to equal or inferior rates as smoothing according to the correlation tree. However, both trees are outperformed by the classical, non hierachical, Benjamini-Hochberg (BH) procedure in terms of detection rates. Other procedures may use the hierachical structure with profit but do not control the False Discovery Rate (FDR) a priori and remain inferior to a classical Benjamini-Hochberg procedure with the same nominal FDR. On real datasets, no hierarchical procedure had significantly higher detection rate that BH. Although intuition advocates the use of a hierachical structure, be it the phylogeny or the correlation tree, to increase the detection rate in microbiome studies, current hierachical procedures are still inferior to non hierachical ones and effective procedures remain to be invented.


26
The microbiota, loosely defined as the collection of microbes that inhabit a given The RF distance is defined on topologies, i.e. trees without branch lengths, 144 and based on elementary operations: branch contraction and branch expansion. 145 A branch contraction step creates a polytomy in the tree by shrinking a branch 146 and merging its two ending nodes whereas a branch expansion step resolves a 147 polytomy by adding a branch to the tree. For any pair of trees, it is possible to 148 turn one tree into the other using only elementary operations. The RF distance 149 is the smallest number of operations required to do so. Note that the RF distance 150 gives the same importance to all branches, no matter how short or long. 151 The BHV distance is defined on trees and accounts for both topology and 152 branch length. It is based on an embedding of tree into a treespace with a 153 complex geometry. All trees with the same topology are mapped to the same 154 orthant, and hyperplanes share a common boundary if and only if they are at RF-155 distance 2 (one contraction and one expansion step away). For any pair of trees, 156 there is a path in treespace between those two trees. The BHV distance is the 157 length of the shortest of these paths. It can be thought of as the generalization of 158 the RF-distance that upweights long branches and downweights short branches. 159 2.3 Forest of trees 160 We generated a forest of boostrapped trees and a forest of random trees in 161 the following way. For the boostrapped forest, we generated N B bootstrap 162 datasets using resampling with replacement (Felsenstein, 1985;Wilgenbusch 163 et al., 2017). Each bootstrap dataset was used to compute a correlation matrix 164 and a correlation tree as detailed in Sec. 2.1.

165
Random trees were generated from a seed tree by shuffling the leaves labels.

166
This allowed us to generate a forest of random trees with the same number 167 of branches as the seed tree. This is especially important for RF-distances as 168 they scale with the number of branches and we want to study both non-binary 169 taxonomic trees with a high number of polytomies and low number of branches 170 and binary correlation trees, with a high number of branches. We generated N T 171 random trees from the taxonomic tree and N C from the correlation tree. The correlation tree is reconstructed from abundance profiles rather than molec-174 ular sequences and/or lineages and may therefore be poorly estimated. We use 175 the bootstrap forest to compute a confidence region around the correlation tree.
The random trees were used to create a null distribution of distances between 177 random trees.

178
The full set of 2 + N B + N T + N C trees was used to construct BHV and 179 RF distance matrices. The distance matrices were then used to visualize a to test whether the taxonomy was in the confidence region of the correlation tree 183 whereas random trees were used to test whether the taxonomic and correlation 184 trees were closer to each other than to random trees. 185 We also compared the distance from the correlation tree to each group of 186 trees using a one-way ANOVA.  194 In this paper, the focus is not on normalization and we used most classi-195 cal approaches in order to assess the impact of taking into account the data 196 hierarchical structure in the differential abundance testing. 197 We briefly present two methods for differential abundance testing (DAT) 198 that leverage a tree-like structure: z-score smoothing as proposed in Xiao et al. 199 (2017) and hFDR as proposed in Yekutieli (2008). Given any taxa-wise DAT procedure, p-values (p 1 , . . . , p n ) are first computed for each taxa (leaves of the tree) and then transformed to z-scores using the inverse cumulative distribution function of the standard Gaussian. Similarly, the tree is first transformed into a patristic distance matrix (D i,j ) and then into a correlation matrix C ρ = (exp (−2ρD i,j )) between taxa. The z-scores z = (z 1 , . . . , z n ) are then smoothed using the following hierarchical model: where µ captures the effect size of each taxa. The maximum a posteriori estimator µ * of µ is given by Hierarchical FDR (hFDR) considers a different framework where differential 208 abundance can be tested not only for a single taxa but also for groups of taxa, 209 corresponding to inner nodes or clades of the tree. hFDR uses a top-down 210 approach: tests are performed sequentially and only for nodes whose parent node 211 were previously rejected. Formally, the procedure is described in Algorithm 1. S ← (S \ N ) ∪ (N \ L) Update stack 8: end while 9: return D for full-tree discoveries or D ∩ L for leaves discoveries hFDR guarantees an a posteriori global FDR control for leafs at level 217 α = 1.44 × α × #discoveries + #families tested #discoveries + 1 .
(1) Figure 1 Example wokflow of hFDR. We tested the impact of tree choice on the performance of both procedures (z-score 237 smoothing and hFDR) on real data and synthetic data simulated from real dataset 238 in one of two following ways. The code and data used to perform the simulations 239 are available on the github repository github.com/abichat/correlationtree analysis.   We used seven different datasets for the experimental part (see Table 1  They also represent diverse microbiome with contrasted biodiversity levels: level and taxa with a prevalence lower than 5% were filtered out.  In all studied datasets, the correlation tree is closer to its bootstrap replicates 321 than to either the taxonomy or the randomized trees (Fig. 3, top row). The region of the correlation tree, nor closer to it than a randomized tree.

332
The only exception is the Chlamydiae dataset, where the phylogeny is within 333 the confidence region of the correlation (Sup. Fig. S1). Note however that this  Similar results are observed when using RF distance instead of BHV distance 345 (Sup. Fig. S2).

346
Athough phylogeny (resp. taxonomy) are evolutionary (resp. ecologically)  were shifted by more than 10 −2 units in less than 5% of the simulations.

377
Concerning FDR control, the standard BH procedure was the only one The quasi-equivalence between BH and correlation tree is not surprising given 388 the absence of smoothing when using the correlation tree. The comparatively 389 bad result of the taxonomy is also expected from our simulation settings as the 390 taxonomy is independent from simulated differential abundance. Forcing the 391 discoveries to be close in the tree therefore introduces a systematic bias and 392 results in a loss of power, especially for differential taxa that are isolated, and 393 an increase in false discoveries, especially for non-differential taxa that are close 394 to differential ones.

395
The better results of a priori uninformative random trees compared to the 396 taxonomy were however more surprising, especially in light of the similar levels 397 of smoothing for all those trees. It turned out that the random trees were, on 398 average, closer to the correct correlation structure of differential taxa than the 399 taxonomy and therefore had a lesser negative impact on the detection power.
400 Figure 4 Average absolute difference between z-scores before and after smoothing. In most simulations, smoothing only marginally changes the results.
It is clear from these results that using a tree reflecting the true data structure,  additional OTUs, at a comparable global FDR of α = 0.324.

431
Abundance boxplots of these three additional OTUs (Fig. 6, insets E and all its ancestors so that it was also tested and rejected.

443
With this top-down approach, the correlation tree is a better candidate 444 hierarchy than the phylogeny. Indeed, the signals of differential OTUs can be 445 averaged out with noise and/or conflicting signal in the phylogeny, they are 446 pooled together in the correlation tree. This makes it easier to reject high level 447 internal nodes and descend the tree toward differential OTUs.

448
It should be noted however that the a posteriori global FDR is quite high at  (ii) has high prevalence (≥0.75%) and abundance in at least one other food type. 468 We can thus classify those 22 as true positives rather than false discoveries.

469
The abundance profiles of the 18 OTUs found only by the correlation tree   The z-scores were thus smoothed to a higher extent but this had almost no 501 impact on the number of detected genera. In this work, we investigated the relevance of incorporating a priori information 519 in the form of a phylogenetic tree in microbiome differential abundance studies.

522
The rationale rests upon the assumption that evolutionary similarity reflects 523 phenotypic similarity. Taxa from the same clade should therefore be more likely 524 to be simultaneously associated to a given outcome than distantly related taxa.

525
Although this assumption sounds natural and supported by evidence for high i.e. species within the same ecological guild could replace each other during the 530 assembly process. 531 We considered here whether the phylogeny and taxonomy were good a 532 priori trees to capture the structure of the abundance data, as captured by the 533 correlation tree. In all the environments we studied, we found that the taxonomy 534 and/or the phylogeny were significantly different from the correlation tree. Taxa 535 with very similar abundance profiles could be widely spread in the phylogeny 536 and vice-versa. The phylogeny was on average no closer to the correlation tree 537 than a random tree, and thus not a good proxy of the abundance data structure. 538 We further studied the impact of tree misspecification on two recently pub-539 lished tree-based testing procedures, z-score smoothing (Xiao et al., 2017) and 540 hFDR top-down rejection (Yekutieli, 2008).

541
Concerning z-score smoothing, we showed on synthetic data that substitut-542 ing the correlation tree to the phylogeny increased the detection rate. Quite 543 surprisingly, replacing the phylogeny with a random tree also increased the 544 detection rate (Fig. 5), questioning the use of the phylogeny in the first place.

545
The results were even more disappointing on real datasets where all trees led to 546 similar detection rates and none of them significantly outperformed standard 547 were limited (Sup. Fig. S7) and stemmed mostly from the way p-values were 549 computed: i.e. using permutations for z-score smoothing and closed formula for 550 BH. Overall, using phylogenetic information to smooth z-scores degrades the 551 detection rate (at worst) or leaves it unchanged (at best).

552
Top-down rejection (hFDR) gave more interesting results. Replacing the 553 phylogeny or taxonomy with the correlation tree increased the detection rate, 554 while preserving the global a posteriori FDR. In general, taxa detected with the 555 correlation tree but not with the phylogeny belonged to clades of mostly non-556 differential taxa in the phylogeny (Fig. 6). Their signal was thus averaged with 557 noise and they discarded early-on in the hierarchical procedure. In contrast, they with support for more complex tests), we recommend sticking to the time-tested 571 BH procedure for differential abundance analysis.