Event Abstract

Extending provenance information in CBRAIN to address reproducibility issues across computing platforms

  • 1 McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Canada
  • 2 University of Lyon, CNRS, INSERM, CREATIS, France
  • 3 University of Southern California, Information Sciences Institute, United States

Context: Neuroimaging tools are prone to reproducibility issues across computing platforms due to the propagation of numerical errors along pipelines (Gronenschild, et al., 2012). When different computing systems are used in the same study, these issues may alter the results and even generate false positives. We are designing a system to identify and mitigate reproducibility issues in experiments executed on distributed computing platforms. This system will extend the provenance information available in the CBRAIN web platform (Sherif, et al., 2014) with system-level monitoring information captured by the Kickstart tool (Deelman, et al., 2006). Method: We processed the 150-subject ICBM dataset (Mazziotta et al., 2001) with 3 pipelines: (i) brain tissue segmentation using FSL FAST (Zhang, et al., 2001) (ii) subcortical structure segmentation with FSL FIRST (Patenaude, et al., 2011) (iii) cortical thickness estimation with Freesurfer (Fischl and Dale, 2000). We used FSL 5.0 (build 506) to compare results obtained on two clusters running Linux CentOS 5 and Fedora 20 respectively. We used Freesurfer 5.3.0 and compared the results obtained with CentOS 4 and CentOS 6 x86_64 builds, executed on the Linux Fedora 20 cluster. Results: Brain tissue segmentations computed in FSL on CentOS5 vs. Fedora 20 have a Dice coefficient higher than 0.999 for grey matter, white matter, and CSF. Numerical differences result in discrete noise-like segmentation errors mostly located at the tissue interfaces (see Figure 1). Using ltrace (http://ltrace.org), we identified that these differences are due to different implementations of the exponential function (expf) between CentOS 5 (glibc 2.5) and Fedora 20 (glibc 2.18). Subcortical structure segmentations computed on CentOS5 vs. Fedora 20 have a Dice coefficient ranging from 0.59 to 1 (see Figure 2). Cortical thickness difference maps thresholded with random field theory (RFT) show significant differences between CentOS4 and CentOS6 Freesurfer builds for p<0.05 and p<0.01 (see Figure 3). Discussion: Different computing platforms may produce substantially different results in neuroimaging pipelines. Therefore it is legitimate to avoid using multiple computing platforms in a study. However, this drastically reduces the amount of available computing resources, which slows down experiments. Our provenance-based system will help identify the maximal set of resources that can be used in a study without altering its results. Figure captions: * Figure 1: Sum of binarized differences between brain tissue segmentations of the 150 ICBM subjects with FSL FAST on Linux CentOS 5 vs. Linux Fedora 20. From top to bottom and left to right: z=33,53,73,93,113. * Figure 2: Histograms of DICE coefficients between segmentations obtained on CentOS5 vs. Fedora 20 with FSL FIRST. mu: mean; sigma: standard deviation. Figure 3: Comparison of cortical thickness maps between CentOS4 and CentOS6 Freesurfer builds. Top row: CentOS6 vs CentOS4; bottom row: CentOS4 vs. CentOS6. From left to right, column (1): t statistics; columns (2)-(4): random field theory (RFT) maps thresholded at p<0.05, p<0.01 and p<0.001, respectively.

Figure 1
Figure 2
Figure 3


1. Gronenschild EHBM, Habets P, Jacobs HIL, et al. The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements. Hayasaka S, ed. PLoS One. 2012;7(6):e38234. doi:10.1371/journal.pone.0038234.
2. Sherif T, Rioux P, Rousseau M-E, et al. CBRAIN: A web-based, distributed computing platform for collaborative neuroimaging research. Front Neurosci. 2014 (under review).
3. Deelman E, Metha G, Vöckler J-S, Wilde M, Zhao Y. Kickstarting remote applications. Available at: https://ritdml.rit.edu/handle/1850/7350. Accessed April 28, 2014.
4. Mazziotta J, Toga A, Evans A, et al. A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). Philos Trans R Soc Lond B Biol Sci. 2001;356(1412):1293–322. doi:10.1098/rstb.2001.0915.
5. Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging. 2001;20(1):45–57. doi:10.1109/42.906424.
6. Patenaude B, Smith SM, Kennedy DN, Jenkinson M. A Bayesian model of shape and appearance for subcortical brain segmentation. Neuroimage. 2011;56(3):907–22. doi:10.1016/j.neuroimage.2011.02.046.
7. Fischl B, Dale AM. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc Natl Acad Sci U S A. 2000;97(20):11050–5. doi:10.1073/pnas.200033797.

Keywords: reproducibility, provenance, segmentation, cortical thickness, CBRAIN

Conference: Neuroinformatics 2014, Leiden, Netherlands, 25 Aug - 27 Aug, 2014.

Presentation Type: Poster, to be considered for oral presentation

Topic: General neuroinformatics

Citation: Glatard T, Lewis LB, Da Silva RF, Rousseau M, Lepage C, Rioux P, Mahani N, Deelman E and Evans AC (2014). Extending provenance information in CBRAIN to address reproducibility issues across computing platforms. Front. Neuroinform. Conference Abstract: Neuroinformatics 2014. doi: 10.3389/conf.fninf.2014.18.00076

Received: 28 Apr 2014; Published Online: 04 Jun 2014.

* Correspondence: Dr. Tristan Glatard, McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montreal, Canada, tristan.glatard@concordia.ca

© 2007 - 2017 Frontiers Media S.A. All Rights Reserved