Brain-CODE: A large-scale neuroinformatics platform for deep and broad data
Francis
Jeanson1*,
Shiva
Amiri1,
Luca
F.
Pisterzi1,
Mojib
Javadi1,
Janice
Pong1,
Kenneth
R.
Evans2,
Anthony
Vaccarino3,
Moyez
Dharsee3,
Ken
Edgecombe4,
Costa
Dafnas4,
Chris
F.
MacPhee4,
Stephen
Strother5,
Tom
Gee5,
Tanya
Schmah5,
Fan
Dong5,
Jin
Chen5,
Anita
Oder5,
Nima
Nourhaghighi6,
Xiaogang
Wang7,
Igor
Solovey8 and
Donald
T.
Stuss1
-
1
Ontario Brain Institute, Canada
-
2
Queen's University, Indoc Research, Canada
-
3
Indoc Research, Canada
-
4
Queen's University, HPCVL, Canada
-
5
Baycrest, Rotman Research Institute, Canada
-
6
Sunnybrook Research Institute, Canada
-
7
Sunnybrook Health Sciences Centre, CPSR, Canada
-
8
Western University, Robarts Research Institute, Canada
The Ontario Brain Institute (OBI) funds over 35 institutions in the province of Ontario that collaborate to form disorder based programs centered on clinical, molecular, and imaging research in neuroscience. These programs which span across multiple institutions include epilepsy, cerebral palsy, neurodevelopmental disorders (for example Autism and ADHD), neurodegeneration (for example Parkinson’s, Alzheimer’s, and ALS), and depression – known collectively as the Integrated Discovery Programs (ID programs). The quality, depth, and diversity of the studies undertaken by these programs and therefore the resulting datasets, represent a significant big data opportunity for neuroscience. To make optimal use of the rich research data collected by the ID programs, OBI is implementing a unique large-scale informatics platform where data streams from each of the programs are being standardized and assimilated, allowing analyses within and across disease states. This web based platform is called Brain-CODE. OBI is working with other stakeholders – and leveraging existing infrastructure where possible – to establish and manage this cutting-edge platform.
Core Functionalities
The five core platform functionalities of Brain-CODE are data capture, data federation, data integration, collaborative data sharing, and secure data analysis. Each of these functionalities is being developed with careful consideration through learning from existing platforms, technologies, expertise, and working closely with researchers – the end users. We briefly outline the current efforts undertaken to address each.
Data Capture Applications
From an informatics perspective, data capture is the process of provisioning software data capture tools that enable researchers or study subjects to enter data via familiar and functionally complete applications, as well as automated data capture from basic science data technology or imaging devices. This is accomplished, in Brain-CODE using a Privacy by Design approach [1] with high security standards, policies, and data protection technology that enable data producers to transfer and store data including personal health information (PHI) of study subjects. In addition, third party data users, only access de-identified data sets which protect the study subjects from being re-identified.
Currently, OBI is leveraging a number of data capture software for researchers including REDCap [2] and Open Clinica [3] for clinical data, BASE for molecular and genomics data, and SPReD [4] for imaging data that is based on XNAT [5]. In addition, the Brain-CODE development team, led by the InDOC consortium [6], has developed custom tools for subject profile data entry known as the ‘Subject Registry’. In addition there is an integration layer built in to integrate the data from the multiple applications. Together, these tools provide a rich set of capabilities for the design of ‘electronic Case Report Forms’ (eCRFs) and administration of clinical instruments.
Integration
The diversity of data being captured in Brain-CODE requires the adoption of powerful data integration tools. To date, the Brain-CODE team has identified a number of open and proprietary tools that can work complementarily for this purpose including BioMart [7] and IBM InfoSphere [8]. We continue to work on the evaluation and identification of high performance tools that will connect to the diverse set of data and manage the user access to these data in Brain-CODE.
Federation
To facilitate the sharing of data and the analysis of data across neurological diseases and data types, two important steps have been taken and standardized across all OBI participating institutions. First, all subjects have their unique provincial health card identification number stored in an encrypted format on Brain-CODE. This will allow the linking of research data for a single subject from other studies or from other federated platforms and provide data scientists and researchers with the ability to gain a much greater understanding of the health condition and history of their subjects. This encryption is performed within the user's web browser, and the original number never leaves the research site; only the ciphertext is transmitted and stored in the Subject Registry. Furthermore, the private key required for decryption is maintained by a third-party and is not known to Brain-CODE. The encryption algorithm developed by the Electronic Health Information Laboratory (EHIL) who are members of the InDOC consortium, has a particular homomorphic property which allows mathematical operations and comparisons to be applied to the encrypted data itself, i.e. without the need for decryption. These encryption capabilities can be applied to other sensitive data stored in Brain-CODE and not only provide robust safeguards against re-identification, they also enable secure data integration. For example, data stored in Brain-CODE can be securely linked with administrative health databases maintained by the Institute for Clinical and Evaluative Sciences (ICES) using encrypted health card numbers. Encrypted deterministic or probabilistic linkages can also be performed with other data repositories (Ontario Health Study, NIH health databases, etc.) without requiring either party to disclose PHI. Other algorithms developed by EHIL are used to determine the risk of reidentification for particular datasets requested by researchers.
Standardization and Common Data Elements
Second, Common Data Elements (CDEs) have been identified using a Delphi consensus process [9] with researchers across the participating institutions. Demographic and clinical instruments that each study will be using have been selected through this process [10]. These CDEs have been based on existing international standards from NINDS [11] and CDISC [12], which will enable the wide ranging datasets in disease type and modality to be compared more effectively resulting in vastly enhanced analytical value. In addition, OBI is actively engaged with researchers in similar efforts to establish imaging and ‘-omics’ data standards. These important efforts will further empower the formation of new hypotheses and discoveries for patient centered care.
Other rich data collection initiatives and services exist across Ontario, Canada, and internationally with which Brain-CODE has a potential to establish partnerships for data linking and sharing. These federation efforts are currently underway in the form of early pilot studies and data transfer using high security protocols. As these efforts take shape, the potential to link administrative health information data or other rich research data for individual patients or across populations will become possible. These federation efforts will enable unprecedented scientific insight into patient health, comorbidities, longitudinal medical profiling, and a deeper understanding of the causal mechanisms behind neurological diseases.
Data Sharing and Analytics
Finally, the Brain-CODE team has recently entered a new design cycle of the Portal functionality and building the analytics capacity of the platform. Importantly, considerations with respect to the diversity of skillsets of the portal users is taken into account in combination with the wide variety of data types that are being collected on the platform. Specialized data analysis tools will be integrated in the initial phase to address clinical, imaging, genomic and proteomic data analysis. Powerful visualization tools for ‘Visual Analytics’ will also be evaluated and integrated into the platform to facilitate quick and intuitive investigation into the rich datasets. To promote the development and integration of powerful combined data analysis software, a developer workspace will be implemented in order to enable and attract data science and analytics talent to de-identified, yet rich, sets of data, carefully selected to address novel challenges in ‘deep and broad’ data analysis. This developer space could not only provide access to research data but also computational resources and technology via web applications and APIs. This developer space approach should help promote interest and the creation of new algorithms and tools for greater discovery in the growing data environment of Brain-CODE.
The collaboration and analytics capacity of Brain-CODE will grow as researchers embrace the potential of sharing their data and with the crucial collaboration of data partners, technology partners, and the training of expertise in data management and analysis. With OBI’s commitment to empower researchers and data scientist in this process of discovery, genuinely rich opportunities are becoming possible for advanced discovery and patient care in the 21st century.
References
[1] http://www.privacybydesign.ca, accessed 04/04/2014
[2] http://project-redcap.org, accessed 04/04/2014
[3] https://www.openclinica.com, accessed 04/04/2014
[4] https://sites.google.com/a/research.baycrest.org/informatics/spred, accessed 04/04/2014
[5] http://www.xnat.org, accessed 04/04/2014
[6] http://www.ocbn.ca/informatics.htm, accessed 04/04/2014
[7] http://www.biomart.org, accessed 04/04/2014
[8] http://www-01.ibm.com/software/data/infosphere, accessed 04/04/2014
[9] Norman Dalkey, Olaf Helmer (1963) An Experimental Application of the Delphi Method to the use of experts. Management Science, 9(3), pp 458-467
[10] https://braincode.ca/content/standards, accessed 04/04/2014
[11] http://www.commondataelements.ninds.nih.gov, accessed 04/04/2014
[12] http://www.cdisc.org, accessed 04/04/2014
Acknowledgements
We would like to acknowledge the Province of Ontario for its funding support as well as the InDOC consortium for their dedicated work on the Brain-CODE platform.
Keywords:
neuroinformatics,
platforms,
data caputre,
data integration,
Data Federation,
data sharing,
Analytics,
common data elements
Conference:
Neuroinformatics 2014, Leiden, Netherlands, 25 Aug - 27 Aug, 2014.
Presentation Type:
Poster, to be considered for oral presentation
Topic:
Infrastructural and portal services
Citation:
Jeanson
F,
Amiri
S,
Pisterzi
LF,
Javadi
M,
Pong
J,
Evans
KR,
Vaccarino
A,
Dharsee
M,
Edgecombe
K,
Dafnas
C,
MacPhee
CF,
Strother
S,
Gee
T,
Schmah
T,
Dong
F,
Chen
J,
Oder
A,
Nourhaghighi
N,
Wang
X,
Solovey
I and
Stuss
DT
(2014). Brain-CODE: A large-scale neuroinformatics platform for deep and broad data.
Front. Neuroinform.
Conference Abstract:
Neuroinformatics 2014.
doi: 10.3389/conf.fninf.2014.18.00018
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
04 Apr 2014;
Published Online:
04 Jun 2014.
*
Correspondence:
Dr. Francis Jeanson, Ontario Brain Institute, Toronto, Ontario, M5G 2K8, Canada, fjeanson@yahoo.com