Technology Report ARTICLE
Integration of “omics” data and phenotypic data within a unified extensible multimodal framework
- 1McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, Canada
- 2Montreal Neurological Institute, Mcgill University, Canada
- 3Ludmer Centre for Neuroinformatics and Mental Health, McGill University, Canada
- 4Lady Davis Institute (LDI), Canada
- 5Douglas Hospital Research Centre, Canada
- 6Jewish General Hospital, Canada
Analysis of “omics” data is often a long and segmented process, encompassing multiple stages from initial data collection to processing, quality control, and visualization. The cross-modal nature of recent genomic analyses renders this process challenging to both automate and standardize; consequently, users often resort to manual interventions that compromise data reliability and reproducibility. This in turn can produce multiple versions of datasets across storage systems. As a result, scientists can lose significant time and resources trying to execute and monitor their analytical workflows and encounter difficulties sharing versioned data. In 2015, the Ludmer Centre for Neuroinformatics and Mental Health at McGill University brought together expertise from the Douglas Mental Health University Institute, the Lady Davis Institute, and the Montreal Neurological Institute (MNI) to form a genetics/epigenetics working group. The objectives of this working group are to i) design an automated and seamless process for (epi)genetic data that consolidates heterogeneous datasets into the LORIS open-source data platform, ii) streamline data analysis, iii) integrate results with provenance information, and iv) facilitate structured and versioned sharing of pipelines for optimized reproducibility using high-performance computing (HPC) environments via the CBRAIN processing portal. This paper outlines the resulting generalizable “omics” framework and its benefits, specifically, the ability to i) integrate multiple types of biological and multi-modal datasets (imaging, clinical, demographics and behavioural), ii) automate the process of launching analysis pipelines on HPC platforms, iii) remove the bioinformatic barriers that are inherent to this process, iv) ensure standardization and transparent sharing of processing pipelines to improve computational consistency, v) store results in a queryable web interface, vi) offer visualization tools to better view the data, and vii) provide the mechanisms to ensure usability and reproducibility. This framework for workflows facilitates brain research discovery by reducing human error through automation of analysis pipelines and seamless linking of multimodal data, allowing investigators to focus on research instead of data handling.
Keywords: workflow, Omics analysis, Longitudinal Studies, integrative neuroscience, Biostatistics, reproducibility, Mutimodal, database, HPC, automated, Genomics, Genetics
Received: 20 Aug 2018;
Accepted: 16 Nov 2018.
Edited by:Sook-Lei Liew, University of Southern California, United States
Reviewed by:Rupert W. Overall, Deutsche Zentrum für Neurodegenerative Erkrankungen, Helmholtz-Gemeinschaft Deutscher Forschungszentren (HZ), Germany
Vincent Frouin, Neurospin, France
Copyright: © 2018 Das, Lecours Boucher, Rogers, Chouinard-Decorte, Oros Klein, Makowski, Beck, Rioux, Brown, Mohaddes, Zweber, Foing, Forest, O’Donnell, Clark, Greenwood, Meaney and Evans. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mr. Xavier Lecours Boucher, McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, Montreal, Quebec, Canada, email@example.com