Diffantom: Whole-Brain Diffusion MRI Phantoms Derived from Real Datasets of the Human Connectome Project

Diffantom is a whole-brain digital phantom generated from a dataset from the Human Connectome Project. Diffantom is presented here to be openly and freely distributed along with the diffantomizer workflow to generate new diffantoms. We encourage the neuroimage community to contribute with their own diffantoms and share them openly.


INTRODUCTION
Fiber tracking on dMRI data has become an important tool for the in vivo investigation of the structural configuration of fiber bundles at the macroscale. Tractography is fundamental to gain information about white matter (WM) morphology in many clinical applications like neurosurgical planning (Golby et al., 2011), post-surgery evaluations (Toda et al., 2014), and the study of neurological diseases as in Chua et al. (2008) addressing multiple sclerosis and Alzheimer's disease. The analysis of structural brain networks using graph theory is also applied on tractography, for instance in the definition of the unique subject-wise patterns of connectivity (Sporns et al., 2005), in the assessment of neurological diseases (Griffa et al., 2013), and in the study of the link between structural and functional connectivity (Messé et al., 2015). However, the development of the field is limited by the lack of a gold standard to test and compare the wide range of methodologies available for processing and analyzing dMRI.
Large efforts have been devoted to the development of physical phantoms (Lin et al., 2001;Campbell et al., 2005;Perrin et al., 2005;Fieremans et al., 2008;Tournier et al., 2008). Côté et al. (2013) conducted a thorough review of tractography methodologies using the so-called FiberCup phantom (Poupon et al., 2008;Fillard et al., 2011). These phantoms are appropriate to evaluate the angular resolution in fiber crossings and accuracy of direction-independent scalar parameters in very simplistic geometries. Digital simulations are increasingly popular because the complexity of whole-brain tractography can not be accounted for with current materials and proposed methodologies to build physical phantoms. Early digital phantoms started with simulation of simple geometries (Basser et al., 2000;Gössl et al., 2002;Tournier et al., 2002;Leemans et al., 2005) to evaluate the angular resolution as well. These tools generally implemented the multi-tensor model (Alexander et al., 2001;Tuch et al., 2002) to simulate fiber crossing, fanning, kissing, etc. Close et al. (2009) presented the Numerical Fiber Generator, a software to simulate spherical shapes filled with digital fiber tracts. Caruyer et al. (2014) proposed Phantomas to simulate any kind of analytic geometry inside a sphere. Phantomas models diffusion by a restricted and a hindered compartment, similar to Assaf and Basser (2005). Wilkins et al. (2015) proposed a whole-brain simulated phantom derived from voxel-wise orientation of fibers averaged from real dMRI scans and the multi-tensor model with a compartment of isotropic diffusion. Neher et al. (2014) proposed FiberFox, a visualization software to develop complex geometries and their analytical description. Once the geometries are obtained, the software generates the corresponding dMRI signal with a methodology very close to that implemented in Phantomas. An interesting outcome of FiberFox is the phantom dataset 1 created for the Tractography Challenge held in ISMRM 2015. This dataset was derived from the tractography extracted in one Human Connectome Project (HCP, Van Essen et al., 2012) dataset. In the tractogram, 25 fiber bundles of interest were manually segmented by experts. Using FiberFox, the segmentation of each bundle was mapped to an analytical description, and finally simulated the signal.
In this data report we present Diffantom, an in silico dataset to assess tractography and connectivity pipelines using dMRI real data as source microstructural information. Diffantom is inspired by the work of Wilkins et al. (2015), with two principal novelties. First, since we use a dataset from the HCP as input, data are already corrected for the most relevant distortions. The second improvement is a more advanced signal model to generate the phantom using the hindered and restricted diffusion model of Phantomas . As a result, we provide a whole-brain digital phantom of dMRI data with structural information derived from an HCP dataset. We also openly release the diffantomizer workflow, the software package necessary to generate custom diffantoms. Diffantom is originally designed for the investigation of susceptibility-derived distortions, a typical artifact that produces geometrical warping in certain regions of dMRI datasets. In Esteban et al. (2014) we addressed this phenomenon and concluded that the connectivity matrix of 1 Available at: http://www.tractometer.org/ismrm_2015_challenge/.
Phantomas was not dense enough to evaluate the integration of correction methods in pipelines for the connectome extraction.

Microstructural Model
The simulation process relies on a microstructural model derived from real data. On one hand, the diffantomizer workflow requires up to five fraction maps {T j | j ∈ {1, . . . , 5}} of freeand hindered-diffusion (see Figure 1A). These compartments will be derived from the macroscopic structure of tissues within the brain, specified in the following order 2 : cortical gray matter (cGM), deep gray matter (dGM), WM, CSF, and abnormal tissue 3 . On the other hand, the restricted-diffusion compartments are specified by up to three volume fractions {F i | i ∈ {1, 2, 3}} of three single fiber populations per voxel along with their corresponding direction maps The process to obtain the microstructural model from one dataset of the HCP can be described as follows (see also Figure 1B): (1) The fiber orientation maps {V i } and their corresponding estimations of volume fraction {F ′ i } are obtained using the ball-and-stick model for multi-shell data of BEDPOSTX (Bayesian Estimation of Diffusion Parameters Obtained using Sampling Techniques modeling crossing -X-fibres, Jbabdi et al., 2012) on the dMRI data. The HCP recommends BEDPOSTX to reconstruct their data (Glasser et al., 2013). A further advantage is that BEDPOSTX exploits the multishell acquisitions of the HCP while operating at whole-brain level.
(2) A fractional anisotropy (FA) map is obtained after fitting a tensor model with MRTrix. As we shall see in the Appendix, the FA is used to infer F 1 (the fraction map of the most prevalent fiber), avoiding the extremely noisy estimation of F ′ 1 performed by BEDPOSTX in the previous step. (3) The original fiber fractions {F ′ i } and the FA map are denoised with a non-local means filter included in dipy (Garyfallidis et al., 2014). This step produces an important smoothing of the maps, while preserving the edges. Smoothing is also beneficial in simplifying the voxel-wise diffusion model. (4) The macrostructural fractions {T ′ j } are extracted from the T1-weighted image of the dataset, using standard FSL segmentation tools (Jenkinson et al., 2012).

Diffusion Signal Generation
Once a microstructural model of the subject has been synthesized, the fiber orientation maps {V i } are weighted by the fiber-fraction maps {F i } and projected onto a continuous representation of the fiber orientation distributions (FODs). A close-up showing how the FODs map looks is presented in Figure 1B. The single fiber response is a Gaussian diffusion tensor with axial symmetry and eigenvalues λ 1 = 2.2 · 10 −3 mm 2 s −1 and λ 2,3 = 0.2 · 10 −3 mm 2 s −1 . The resulting FODs map is then combined with the free-and hindered-diffusion compartments corresponding to {T j }. The free-diffusion compartment corresponds to the CSF fraction map T 4 and is modeled with isotropic diffusivity D CSF of 3.0 · 10 −3 mm 2 s −1 . The hindered-diffusion compartments correspond to {T 1 , T 2 , T 3 } and are also modeled with isotropic diffusivity D WM = 2.0 · 10 −4 , D cGM = 7.0 · 10 −4 and D dGM = 9.0 · 10 −4 , respectively [mm 2 s −1 ]. All these values for diffusivity (and the corresponding to the single-fiber response) can be modified by the user with custom settings. The restricted-and hinderedcompartments are then fed into Phantomas  and the final dMRI signal is obtained. By default, diffusion data are generated using a scheme of 100 directions distributed in one shell with uniform coverage (Caruyer et al., 2013). Custom oneor multi-shell schemes can be generated supplying the tables of corresponding vectors and b-values. Rician noise is also included in Phantomas, and the signal-to-noise ratio (SNR) can be set by the user. The default value for SNR is preset to 30.0.

Implementation and Reproducibility
We also provide the diffantomizer workflow, the software package used to generate diffantoms, so that users can regenerate similar datasets with different parameters. This workflow, presented in Figure 1, is implemented using nipype (Gorgolewski et al., 2011) to ensure reproducibility and usability.

Interpretation and Recommended Uses
To illustrate the features of Diffantom, the example dataset underwent a simplified connectivity pipeline including constrained spherical deconvolution (CSD) and probabilistic tractography from MRTrix (Tournier et al., 2012). CSD was reconstructed using 8th-order spherical harmonics, and tractography with 1.6 · 10 6 seed points evenly distributed across a dilated mask of the WM tissue. Figures 2A1,A3, show the result of the tractography obtained with such pipeline for the original Diffantom and a distorted version. Finally, we applied tract querier (Wassermann et al., 2013) to segment some fiber bundles such as the CST and the forceps minor (see Figures 2A2,A4). Particularly, due to its location nearby the orbitofrontal lobe, the forceps minor is generally affected by susceptibility distortions.
We recommend Diffantom as ground-truth in verification and validation frameworks ( Figure 2B) for testing pipelines. Diffantom is applicable in the unit testing of algorithms, the integration testing of modules in workflows, and the overall system testing. Some potential applications follow: • Investigating the impact of different diffusion sampling schemes on the local microstructure model of choice and on the subsequent global tractography outcome. Since the gradient scheme can be set by the user, Diffantom can be seen as a mean to translate the so-called b-matrix of the source dataset to any target scheme. • Assessment of sensitivity and robustness to imaging artifacts (noise, partial volume effect and CSF contamination, susceptibility-derived warping, Eddy-currents-derived distortions, etc.) at unit, integration and systems testing levels.
• Using Diffantom as in Figure 2B, it is possible to apply binary classification measures to evaluate the resulting connectivity matrix. Considering the connectivity matrix of the reference Diffantom and the resulting matrix of the test Diffantom, the receiver operating characteristic (ROC) of the pipeline can be characterized.
• Simulation of pathological brains by altering the microstructural model accordingly (e.g., as tumors were simulated in Kaus et al., 2000).
In order to exemplify one of these intended uses, we also release a Diffantom including the susceptibility-derived distortion in simulation. These two images belong to a broader dataset, automatically generated, used in a study to quantify the impact of susceptibility distortions and correction methods on the connectome extraction (Esteban, 2015, Chapter 5). In this study, three widely-used correction methods are compared in a reference framework of several Diffantoms with realistic and controlled distortions. This context provides a useful resource to characterize the impact of susceptibility distortion on the final connectivity network and allows the evaluation of the different correction methodologies available.

DISCUSSION
Whole-brain, realistic dMRI phantoms are necessary in the developing field of structural connectomics. Diffantom is a derivative of Wilkins et al. (2015) in terms of methodology for simulation with two major advances. First, the correctness of the minimally preprocessed data (Glasser et al., 2013) released within the HCP. Wilkins et al. (2015) explicitly state that their original data were not corrected for certain artifacts, and thus, generated data are affected correspondingly. Second, Diffantom implements the hindered and restricted compartments model (Assaf and Basser, 2005), which is a more complete model than the multi-tensor diffusion model. A possible competitor to Diffantom is the phantom generated for the Tractography Challenge in ISMRM 2015. Similarly to Diffantom, the organizers used an HCP subject as source of structural information. While this phantom is designed for the bundle-wise evaluation of tractography (with the scores defined in the Tractometer (Côté et al., 2013), such as geometrical coverage, valid connections, invalid connections, missed connections, etc.), Diffantom is intended for the connectome-wise evaluation of results, yielding a tractography with a large number of bundles. Therefore, Diffantom and FiberFox are complementary as the hypotheses that can be investigated are different. Moreover, Diffantom does not require costly manual segmentation of bundles, highly demanding in terms of physiology expertise and operation time. The software workflow released with this data report (the diffantomizer) ensures the reproducibility of Diffantom and enables the generation of custom diffantoms. The diffantomizer is designed for, but not limited to, use HCP datasets as source of structural information.

CONCLUSION
Diffantom is a whole-brain digital phantom generated from a dataset from the Human Connectome Project. Diffantom is presented here to be openly and freely distributed along with the diffantomizer workflow to generate new diffantoms. We The phantom is designed to be used as ground-truth information in evaluation frameworks, to implement unit test of algorithms, to check integration of processing units within pipelines or to validate complete workflows. For instance, in order to evaluate artifacts, a perturbation can be induced in the microstructural model or after simulation to provide reference and test datasets.
Frontiers in Neuroinformatics | www.frontiersin.org encourage the neuroimage community to contribute with their own diffantoms and share them openly.

DATA SHARING
The first Diffantom and its distorted version are available under the Creative Commons Zero licence (CC0) using the Dryad Digital Repository (doi:10.5061/dryad.4p080). The package is organized following the BIDS standard. The associated software to "diffantomize" real dMRI datasets is available at https://github. com/oesteban/diffantom under an MIT license. Phantomas is available in https://github.com/ecaruyer/Phantomas under the revised-BSD license.

AUTHOR CONTRIBUTIONS
All the authors contributed to this study. OE designed the data generation procedure, implemented the processing pipelines and generated the example dataset. EC implemented Phantomas , helped integrate the project with the simulation routines. OE, EC, AD thoroughly discussed and framed the aptness of the data in the community. AD, MB, ML, and AS interpreted the resulting datasets. MB, ML, and AS advised on all aspects of the study.

FUNDING
This study was supported by the Spanish Ministry of Science and Innovation (projects TEC-2013-48251-C2-2-R and INNPACTO XIORT), Comunidad de Madrid (TOPUS) and European Regional Development Funds, the Center for Biomedical Imaging (CIBM) of the Geneva and Lausanne Universities and the EPFL, as well as the Leenaards and Louis Jeantet Foundations.