Impact Factor 3.517 | CiteScore 3.60
More on impact ›

Review ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Genet. | doi: 10.3389/fgene.2019.00627

Multitable Methods for Microbiome Data Integration

  • 1Montreal Institute for Learning Algorithm (MILA), Canada
  • 2Department of Statistics, School of Humanities and Sciences, Stanford University, United States

The simultaneous study of multiple measurement types is a frequently encountered
problem in practical data analysis. It is especially common in microbiome
research, where several sources of data -- for example, 16s-rRNA, metagenomic,
metabolomic, or transcriptomic data -- can be collected on the same physical
samples \citep{Franzosa2015, McHardy2013}. There has been a proliferation of
proposals for analyzing such multitable microbiome data, as is often the case
when new data sources become more readily available, facilitating inquiry into
new types of scientific questions \citep{Fukuyama2017, Rahnavard2017,
Chaudhary2017, Chalise2017, perez2014genome}.

However, stepping back from the rush for new methods for multitable analysis in
the microbiome literature, it is worthwhile to recognize the broader landscape
of multitable methods, as they have been relevant in problem domains ranging
across economics \citep{hannan1967canonical}, robotics
\citep{vlassis2000supervised}, genomics \citep{gomez2014data}, chemometrics \cite{pages2001multiple}, and neuroscience \cite{ashish2010neuroscience}. In different
contexts, these techniques are called data integration, multi-omic, and
multitask methods, for example. Of course, there is no unique optimal algorithm
to use across domains -- different instances of the multitable problem possess
specific structure or variation that are worth incorporating in methodology.

Our purpose here is not to develop new algorithms, but rather to (1) distill
relevant themes across different analysis approaches and (2) provide concrete
workflows for approaching analysis, as a function of ultimate analysis goals and
data characteristics (heterogeneity, dimensionality, sparsity, ...). Towards the
second goal, we have made code for all analysis and figures available online at

Keywords: microbiome, Microbiome analysis, data integration, multi-omic analysis, dimensionality reduction, Heterogeneous data analysis

Received: 14 Oct 2018; Accepted: 17 Jun 2019.

Copyright: © 2019 Sankaran and Holmes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Kris Sankaran, Montreal Institute for Learning Algorithm (MILA), Montreal, Canada,