Editorial: Multi-omic data integration
- 1Lazzari, Bologna, Italy
- 2Group of Clinical Genomic Networks, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai, China
- 3Quintiles, Reading, UK
- 4Consiglio Nazionale delle Ricerche, Istituto per le Applicazioni del Calcolo, Rome, Italy
As researchers involved in molecular biology, we are witnessing tremendous paradigm changes in a time frame that becomes shorter and shorter. The epoch-making notion, originally put forward by the central dogma of biology (Crick, 1970), that there is a unidirectional process and a privileged level (genetic) of causality at which biological functions are determined, has already long and strongly been challenged. It is in fact well recognised that multi-level causality with feedback cycles among all former and newly identified biochemical levels (including small RNAs, epigenomic changes) is a fundamental attribute of biological systems (Noble, 2012).
Yet, the focus shift from single reactions to transcriptomics, promoted by microarray first and sequencers now, is already challenged by a novel, pressing offer from fast evolving technologies. Indeed, the possibility to have a omic view on virtually all molecular layers (genomes, metagenomes, transcriptomes, proteomes, epigenomes) pushes to integrate the study of systems at yet another level of complexity, a run harmed, and not negligibly, by the difficulties in formatting, storing, and reusing the deluge of data encompassing every level of biological organization.
In such a complex background, it is growingly acknowledged that tools and theoretical frameworks that could help in combining and giving account for both the multi-level causation scheme and the burden of data are still underdeveloped (Witzany and Baluska, 2012).
From these considerations, a novel, pressing request arises to design methodologies, approaches and frameworks that allow for these data to be interpreted as a whole, i.e., as intertwined molecular signatures containing genes, proteins, mRNAs, and miRNAs, but also epigenomic characterizations, as well as correlations with microbiomes' compositions, just to name the major, able to capture the inter-layers connections and the complexity of phenotypes. This request is seconded by demands and concerns about the storage and reusability of much of such different omic data. Indeed, although publicly and freely available, these data often lie in databases and repositories underutilized or not used at all. Issues coming from lack of standardization and shared biological identities are also well known to represent a hurdle for data reuse (Tieri and Nardini, 2013; Chowdhury and Sarkar, 2015).
The “Multi-Omic Data Integration” Research Topic is in our intention a dedicated forum to collect efforts that help in defining this emerging field, aimed to the integration of data, analyses and approaches from, and for multiple omics.
The articles here collected address these questions from a number of perspectives that we summarize as experimental, network based, and methodological. In the first category the authors extract and analyse different types of high-throughput data (epitomics, localisomics, transcriptomics, lipidomics) from different locations on model organisms [Arabidopsis thaliana (Wilson et al., 2015) and rhesus macaques (Lee et al., 2014)] to understand a complex biological question (roots' growth and response to anti-malarial drugs) that could not be addressed with single-omic approaches.
We transition from these approaches to more theoretical ones via the usage of graphs. Networks offer a complete, intuitive, versatile, and powerful approach to the representation of complex systems (genomics, epigenomics, transcriptomics, metabolomics, host-microbiome interface, diseases' phenomics) which is here exploited to represent the multifaceted aspects of complex autoimmune diseases (rheumatoid arthritis, Tieri et al., 2014) in order to evaluate complex side effects of old and novel therapies; to identify disease molecules that can be both effective therapeutic targets relevant progression markers with application to diabetic nephropathy (Heinzel et al., 2014); to stratify patients with comorbidities (Moni and Lio, 2015).
Methodological approaches point with a novel emphasis at the importance of molecules' spatial localization in the omic context. From polysome and ribosome profiling, RNA, and miRNA binding sites annotation and standardization (Dassi and Quattrone, 2014), to networks including 3D molecules' proximity thanks to Chromosome Conformation Capture (3C) and its omic version Hi-C (Merelli et al., 2015), spatial representation contributes with an important layer of information in this added multi-omic complexity.
Beyond spatial organization, temporal progression and causal inference are discussed to model the heterogeneity of CD4+ T cells and their complex immune responses (Carbo et al., 2014), and to predict gene networks based on ChIP-seq and RNA-seq integration (Angelini and Costa, 2014).
Finally, meta analyses of genomes, be it for the exploration of microbiomes' compositions or disease genome-wide association studies (GWAS) still benefit from discussion in this research topic, on one side for the need of standardization of the workflow (Ladoukakis et al., 2014) in a relatively novel research area (omic microbiology) and on the other side to compensate with multi-omic layers to the limited statistical power and reproducibility of GWAS (Lin et al., 2014).
This collection is the tip of an iceberg that continues to grow and to evolve in multiple directions. From the continuously improving efficiency of existing high-throughput platforms that imply easier, cheaper and more frequent spatio-temporal sampling, to the input of novel technologies that will offer omic views on novel types of data (phenotypes, tissues, 3D proteins etc., all entailing the production and approval of dedicated standards for data storage) we are only at the beginning of almost endless possibilities of data integration.
However, to avoid getting lost in the sea of data, efficient algorithms as well as biologically meaningful directions in which to integrate information will be of importance. This will imply not only the implementation of powerful tools to give answers, but also the design of careful approaches to form questions.
We hope and foresee that these needs will foster the collaboration between biologists, medical doctors, statisticians, and computer scientists further, transforming the residual perception of this forced cooperation from a burden to a resource. The impact of completing this other type of integration among scientific expertise is difficult to predict at large, but can easily be assumed as a necessary and crucial starting point for the effective implementation of personalized medicine, where patients' and health practitioners' needs are translated into technology and report on systemic markers, offering patients the possibility to be treated as a whole and not as a mere assemblage of parts to be “adjusted.”
This project is partialy funded by MoST international cooperation program no. 2013DFA30790, NSFC no. 31070748, EC FP7-PEOPLE-2011-IRSES program, project 294935 “KEPAMOD,” and CAS fellow grant no. 2011Y1SA04.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Angelini, C., and Costa, V. (2014). Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems. Front. Cell Dev. Biol. 2:51. doi: 10.3389/fcell.2014.00051
Carbo, A., Hontecillas, R., Andrew, T., Eden, K., Mei, Y., Hoops, S., et al. (2014). Computational modeling of heterogeneity and function of CD4+ T cells. Front. Cell Dev. Biol. 2:31. doi: 10.3389/fcell.2014.00031
Heinzel, A., Perco, P., Mayer, G., Oberbauer, R., Lukas, A., and Mayer, B. (2014). From molecular signatures to predictive biomarkers: modeling disease pathophysiology and drug mechanism of action. Front. Cell Dev. Biol. 2:37. doi: 10.3389/fcell.2014.00037
Lee, K. J., Yin, W., Arafat, D., Tang, Y., Uppal, K., Tran, V., et al. (2014). Comparative transcriptomics and metabolomics in a rhesus macaque drug administration study. Front. Cell Dev. Biol. 2:54. doi: 10.3389/fcell.2014.00054
Lin, D., Zhang, J., Li, J., He, H., Deng, H. W., and Wang, Y. P. (2014). Integrative analysis of multiple diverse omics datasets by sparse group multitask regression. Front. Cell Dev. Biol. 2:62. doi: 10.3389/fcell.2014.00062
Merelli, I., Tordini, F., Drocco, M., Aldinucci, M., Lio, P., and Milanesi, L. (2015). Integrating multi-omic features exploiting chromosome conformation capture data. Front. Genet. 6:40. doi: 10.3389/fgene.2015.00040
Wilson, M. H., Holman, T. J., Sorensen, I., Cancho-Sanchez, E., Wells, D. M., Swarup, R., et al. (2015). Multi-omics analysis identifies genes mediating the extension of cell walls in the Arabidopsis thaliana root elongation zone. Front. Cell Dev. Biol. 3:10. doi: 10.3389/fcell.2015.00010
Keywords: multi-omics, multi-omic data integration, integration, systems biology, network analysis
Citation: Nardini C, Dent J and Tieri P (2015) Editorial: Multi-omic data integration. Front. Cell Dev. Biol. 3:46. doi: 10.3389/fcell.2015.00046
Received: 27 May 2015; Accepted: 25 June 2015;
Published: 07 July 2015.
Edited by:Raina Robeva, Sweet Briar College, USA
Reviewed by:Matteo Barberis, University of Amsterdam, Netherlands
Copyright © 2015 Nardini, Dent and Tieri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Paolo Tieri, firstname.lastname@example.org