Subject–Motion Correction in HARDI Acquisitions: Choices and Consequences

Diffusion-weighted imaging (DWI) is known to be prone to artifacts related to motion originating from subject movement, cardiac pulsation, and breathing, but also to mechanical issues such as table vibrations. Given the necessity for rigorous quality control and motion correction, users are often left to use simple heuristics to select correction schemes, which involves simple qualitative viewing of the set of DWI data, or the selection of transformation parameter thresholds for detection of motion outliers. The scientific community offers strong theoretical and experimental work on noise reduction and orientation distribution function (ODF) reconstruction techniques for HARDI data, where post-acquisition motion correction is widely performed, e.g., using the open-source DTIprep software (1), FSL (the FMRIB Software Library) (2), or TORTOISE (3). Nonetheless, effects and consequences of the selection of motion correction schemes on the final analysis, and the eventual risk of introducing confounding factors when comparing populations, are much less known and far beyond simple intuitive guessing. Hence, standard users lack clear guidelines and recommendations in practical settings. This paper reports a comprehensive evaluation framework to systematically assess the outcome of different motion correction choices commonly used by the scientific community on different DWI-derived measures. We make use of human brain HARDI data from a well-controlled motion experiment to simulate various degrees of motion corruption and noise contamination. Choices for correction include exclusion/scrubbing or registration of motion corrupted directions with different choices of interpolation, as well as the option of interpolation of all directions. The comparative evaluation is based on a study of the impact of motion correction using four metrics that quantify (1) similarity of fiber orientation distribution functions (fODFs), (2) deviation of local fiber orientations, (3) global brain connectivity via graph diffusion distance (GDD), and (4) the reproducibility of prominent and anatomically defined fiber tracts. Effects of various motion correction choices are systematically explored and illustrated, leading to a general conclusion of discouraging users from setting ad hoc thresholds on the estimated motion parameters beyond which volumes are claimed to be corrupted.


INTRODUCTION
Diffusion-weighted (DW)-MRI enables probing the fiber architecture of biological tissues -in vivo -by encoding the microscopic direction and speed of the diffusion of water molecules (4), while reflecting the amount of hindrance experienced by such molecules along the axis of the applied diffusion gradient due to barriers and obstacles imposed by micro-structures (5). Today, diffusion tensor imaging (DTI) is the method of choice for most neuroimaging studies, e.g., autism (6), schizophrenia (7), and Huntington's disease (8). Nonetheless, DTI assumes a homogeneous axon population inside a single voxel (9) and fails at modeling more realistic heterogeneous populations. High angular resolution diffusion imaging (HARDI) (10), on the other hand, allows the diffusion acquisition to focus on the angular component of the DW signal using strong gradients and long diffusion times (5), while revealing the intra-voxel orientational heterogeneity, such as crossing and merging fiber bundles. The promising potential of HARDI-based DW-MRI in describing fiber tracts within the human brain comes with a price tag of a wide variety of artifacts related to the gradient www.frontiersin.org system hardware, pulse sequence, acquisition strategy, and subject motion (11). Such artifacts render the quality of diffusion imaging questionable and reduce the accuracy of findings when left uncorrected (1).

MOTION ARTIFACTS
In today's clinical DW-MRI acquisitions, the presence of the long and strong gradient pulses have made diffusion MRI more sensitive to the detrimental effects of subject motion than other MRI techniques (9,12,13). During a scanning session, the degree of a patient's cooperation may vary: elderly people who may become uncomfortable during large scanning sessions, patients in pain who become restless and agitated during a scan, and unsedated pediatric subjects who will not cooperate long enough to be imaged without motion artifacts. Hence, it is safe to assume that there are always motion artifacts in any given DW-MRI acquisition due to the increased likelihood of involuntary subject motion; especially with HARDI acquisitions, which use a large number of gradient directions resulting in longer scan times. A proof-of-concept of this hypothesis is presented in section 1.
Motion artifacts range from physiological motion (e.g., cardiac pulsation and respiration) to physical (voluntary or involuntary) bulk movement by the patient (14). Physiological motion can be controlled by gating or in the sequence design (15), but the patient bulk movement during the diffusion-encoding gradient pulses leads to severe signal perturbation (16)(17)(18), which results in a significant signal phase shift or signal loss (19). The effects of bulk motion are twofold: slow bulk motion can cause misalignment of diffusion data between subsequent gradient applications (i.e., DWI-volumes), resulting in an underestimation of diffusion anisotropy (4), whereas fast bulk motion during the application of a single diffusion gradient causes inhomogeneous signal dropout/attenuation artifacts in the diffusionweighted images. This dropout effect arises due to signal dephasing within the voxels (13,14), which is the very phenomenon that gives rise to the DW-MRI contrast, leading to an overestimation of diffusion anisotropy (4). Although misalignment can be tackled by registration-based correction methods (20), the signal dropout due to intragradient motion will persist (4), where such images are identified and excluded from further processing and/or scheduled for reacquisition during the same scan (13,14,(21)(22)(23). Left uncorrected, motion-corrupted datasets introduce bias in the subsequent findings due to the induced variability of diffusion MRI measurements, while affecting the statistical properties of diffusion derived measures in heterogeneous brain regions.

MOTION CORRECTION CHOICES
The identification and elimination of slow bulk motion artifacts in HARDI data, which are characterized by a high bvalue and low signal-to-noise (SNR) ratio, still remains a challenge. In order to allow correction approaches to proceed with reasonable accuracy, motion occurring between diffusion gradients can be treated as if it occurred all at once (24).
Motion effects can be reduced by real-time motion control during the acquisition (a.k.a. prospective motion correction) (25)(26)(27), where the acquisition and the source of motion are synchronized, so that the data are never corrupted. In addition, the development of accelerated acquisition methods [e.g., Ref. (28)] can reduce the duration of a scan to minimize the susceptibility of subject motion. A comfortable padding can also be used to minimize head motion while urging the participant to remain without movement (11). Nonetheless padding is not always effective in studies involving infants [e.g., autism diagnosis (29)], where remaining still in the scanner may be more challenging. Nevertheless, prospective methods for motion correction might affect the acquisition time due to the reacquisition of motion-corrupted gradients (14). Such methods might also require external optical tracking systems (23), free-induction decay navigators (26), or volumetric navigators (30), which are not always available on current scanners (27), coupled with the need of time-consuming calibration steps prior to their use (14). Furthermore, rapid modification of diffusion gradients may induce eddy current artifacts (13), and there is no guarantee that the head will move back to the original position.
Motion compensation can also be performed as a postprocessing step after acquisition, i.e., retrospective, to guarantee voxel-wise correspondence between different DWIs referring to the same anatomical structure. A common practice is to heuristically select transformation parameter thresholds for detection of motion outliers, where registration and interpolation are applied to gradient directions that are claimed to be corrupted. Software packages for image-based registration of DWIs are becoming readily available, e.g., FSL-MCFLIRT (2,31), the Advanced Normalization Tools (ANTS) (32), TORTOISE (3), and BRAINSFit (33) employed in DTIPrep (1).
A typical retrospective motion correction algorithm involves two stages (20): first, finding the global transformation parameters that would transform all DWIs to the same coordinate frame, and then, applying the estimated transformations to the diffusion data. Solving for the transformation parameters usually involves rigidly registering the DWIs to a reference volume representing the same anatomical structure, but without being contaminated by motion artifacts. Examples of such a reference include a T2-weighted image (16), or a non-diffusion-weighted image (a.k.a baseline with b-value = 0) due to its high SNR and lesser vulnerability to eddy current distortion (34), where the difference in intensity profiles is compensated for using normalized mutual information similarity measure. Another alternative is a model-based reference volume computed for each diffusion-weighted image based on tensor fitting (35,36). Model-based motion correction implicitly assumes that the original position defined by the baseline volume is the reference position to be aligned to Sakaie and Lowe (20). Recently, it has been shown that model-based motion correction becomes a more powerful choice for correcting higher b-value diffusion imaging, which does not contain enough anatomical features to be registered accurately (36).
Applying the estimated transformation parameters is performed using interpolation, which computes intensities at transformed voxel coordinates as a weighted sum of the scaled intensities at surrounding voxels. The diffusion gradient vectors are also reoriented to incorporate the rotational component of subject motion (37). Interpolation is usually carried out by an exact fit of a continuously defined model to discrete data samples. Nonetheless, this exact fit is less appropriate when data are noise-corrupted, since the model is forced to fit the noise too. Although using regularized interpolation can tackle noisy data, it is only preferable to applying denoising followed by standard interpolation under the assumption that the signal is a stationary Gaussian process (38); a situation that is not applicable for diffusion-weighted images, which are contaminated by Rician noise. Based on the central limit theorem, the (weighted) average of a large set of i.i.d. samples tends to follow a normal distribution. Thus, interpolation between Rician distributed samples might change the distribution toward a Gaussian PDF (39). We can, therefore, argue that the denoising process decreases the effect of standard interpolation on altering the underlying data distribution.
Another retrospective approach is to cast motion correction as an outlier rejection process, ranging from simply excluding one or more gradients bearing strong motion artifacts beyond acceptable levels of motion (11,14,40), to statistical methods for detecting and discarding voxel-wise diffusion measurements as outliers (17,41,42). Usually discarding entire scans (a.k.a motion scrubbing in functional MRI) either can be performed by visual inspection or based on predefined thresholds on estimated motion parameters (4). Nevertheless, removing gradients limits the ability to reconstruct crossing fibers, especially at small separation angles, due to the decreased number of distinct gradient directions needed for diffusion reconstruction. Moreover, scrubbing would introduce intersubject SNR and bias differences that would in turn affect subsequent statistical analysis (1). On the other hand, local exclusion of corrupted voxels for robust diffusion reconstruction in the presence of outliers is based on the deviation of the observed measurements (usually after motion correction) from the assumed diffusion model. Using these approaches for motion correction itself would mingle the effect of being an outlier to an assumed model with that of being corrupted due to motion. Further, local exclusion would lead to a different number of DWIs locally available for each voxel, complicating subsequent analysis to avoid bias due to different SNR values for different brain regions (1).
A common concern with retrospective methods in clinical studies, whether registration-based and/or outlier-based, is that data with different levels of motion will be subject to different schemes of motion correction. For instance, patients may show more motion than controls, or sedated subjects may be different from non-sedated. Applying different motion correction schemes could introduce a confounding factor for statistical analysis of populations that show different motion patterns. Nonetheless, eyeballing the acquired/preprocessed DWIs prior to proceeding to further analysis is highly recommended.

OBJECTIVE AND CONTRIBUTIONS
The lack of a comprehensive/rigorous quality control (QC) for HARDI datasets can result in considerable error and bias in subsequent analyses, which may affect research studies using these datasets. Most current software packages such as DTIPrep (1), TORTOISE (3), and FSL (2), which offer various tools for processing and analysis of diffusion-weighted images, are mostly limited to DTI datasets, which are characterized by low b-values (i.e., higher SNR) and fewer gradients (i.e., shorter acquisition times). Nonetheless, special care is needed for HARDI datasets due to their low SNR and longer acquisition times, which increase the likelihood of subject motion. As a part of a thorough pipeline for HARDI-QC, this paper addresses the motion correction aspect for slow bulk motion where users often do not fully understand the consequences of different types of correction schemes on the final analysis, and whether those choices may introduce confounding factors when comparing populations. Therefore, the presented work is directed toward clear guidelines and recommendations to the standard users in practical settings.
The optimal preprocessing pipeline for HARDI sequences remains an open question and a challenge for real data. Questions that might arise include: is there a threshold that would identify a motion-corrupted volume? How sensitive are HARDI reconstructions to such a predefined threshold? What is the impact of various motion correction schemes on subsequent HARDI-based reconstructions and tractography? So far, these questions have received, surprisingly, little attention in various DW-MRI studies of clinical populations. This study, then, focuses on the effect of preprocessing schemes, in particular motion correction, commonly deployed as a post-acquisition step, on succeeding steps. We propose a comprehensive experimental framework (see Figure 1) that enables making use of human brain HARDI data from a well-controlled motion experiment to simulate various degrees of motion/noise corruption. The comprehensiveness is related to the systematic evaluation of the outcome of different motion correction choices commonly used by the scientific community on different DWI-derived measures. To our knowledge, this evaluation does not exist in the literature and has not been discussed in detail.
Choices for correction include exclusion or registration of motion corrupted directions, with different choices of interpolation, as well as the option of registration/interpolation of all directions versus corrupted directions only. The effect of denoising as a preprocessing step applied prior to motion correction is also investigated. Further, the choice of the reference volume used in the registration framework is also discussed. The comparative evaluation covers four metrics: (1) the similarity of fiber orientation distribution functions (fODFs) via Jensen-Shannon divergence (JSD), (2) the deviation of multiple fiber orientations at each voxel, (3) the global brain connectivity via graph diffusion distance (GDD), and (4) the reproducibility of seven anatomically defined fiber pathways via Cohen's Kappa statistics. On the basis of our findings, we recommend assuming that motion is inevitable, even subtle, in the acquired scans. Motion correction, therefore, needs to be applied to all gradient directions without relying heuristically on a threshold that determines a gradient direction to be claimed as motion corrupted.
www.frontiersin.org FIGURE 1 | A comprehensive experimental framework for subject motion simulation to systematically evaluate the outcome of different motion correction choices commonly used by the scientific community on HARDI-based reconstructions and tractography. (A) A human brain HARDI data were acquired from a well-controlled motion experiment. (a.1) Acquired DWIs were preprocessed to obtain nearly noise-free and motion-free datasets. (a.2) For automated tractography selection and the quantification of whole brain connectivity, a subject-specific unbiased atlas was constructed via DTI-derived data from HARDI sequences resulting in a tensor atlas, where we can define a detailed parcelation of neuroanatomical structures, and map it back to each raw scan. (B) Noticeable motion was then simulated by randomly mixing gradients from the acquired datasets. (C) Motion correction involves four main decision variables where each distinct combination of choices defines a correction scheme. (D) Reconstruction of a corrected or motion-free dataset entails reconstructing the voxel-wise fiber orientation distribution functions, detecting local (voxel-wise) fiber orientation, preforming whole brain tractography, and automatically selecting anatomical pathways. (E) The evaluation of the effect of a motion correct scheme has been investigated based on voxel-wise metrics, global brain connectivity metric, and tract-based metric.

MOTION IS INEVITABLE: PROOF-OF-CONCEPT
To back up our assumption that motion is omnipresent, we analyzed data from three healthy human phantoms (males 30-40 years old) visiting each of the four clinical sites (Chapel Hill, Philadelphia, St. Louis, and Seattle) as a part of the ACE-IBIS study [Autism Centers for Excellence, Infant Brain Imaging study (6)], using a total of six MRI systems (two sites using both research and hospital scanners). All study procedures were approved by the institutional review board at each clinical site, and informed, written consent was obtained for all participants. In addition, the traveling phantoms sign consent forms at each of the sites, as per their own institutional IRBs. The sites include the University of Washington, Seattle, the Washington University in St. Louis, the Childrens hospital of Philadelphia, and the University of North Carolina at Chapel Hill. Each subject was scanned twice on a 3 T Siemens Tim Trio scanner 1 with a strict calibration of image acquisition parameters. Test-retest reliability at each site was established with two scans within 24 h. The scans were acquired within 1 week to guarantee that there were no major brain changes over time. The scanning environment was well controlled. Comfortable padding was used to minimize head motion, and patients were urged to remain without movement. Eddy current was compensated for using a Twice Refocused Spin Echo 1 The protocol used a GRAPPA parallel imaging factor of 2 and a partial Fourier factor of 3/4, which does indeed result in non-Rician noise distributions. However, the effect of the noise distribution is expected to be relatively small at a b values of 2000 s/mm 2 [e.g., Ref. (74)] and we do not expect the difference in noise profile to affect our conclusions in terms of the motion correction schemes. Initially, we ran automated Quality Control on the DWIs via DTIPrep (1), which includes among other steps interlaced correlation analysis for detection and removal of fast bulk motion within a single DWI volume, where no quantitative within-gradient motion was detected. Inspired by Sakaie and Lowe (20), FSL-MCFLIRT (31) was then used to provide the rigid transformation matrix (i.e., six degrees of freedom) for each volume having the baseline image as the reference for motion correction and normalized mutual information as the cost function. It is worth noting that MCFLIRT employs a global-local hybrid optimization method for robust affine registration that is specifically tailored to brain images. Within a multiresolution framework, four scales were used (8, 4, 2, and 1 mm, i.e., supervoxel vs. subvoxel). At each scale, volumes 2 In our analysis, we opt to using a prospective approach [a Twice-Refocused Spin Echo (TRSE) sequence] for eddy current compensation in order not to introduce any alignment-based preprocessing before running motion correction that could already have a confounding effect (otherwise we would work on resampled and interpolated images before motion detection). Further, we have eye-balled the FA map of the acquired sequences (prior to motion correction) where the prominent edge artifact (regions of anomalous diffusion contrast resulted from misregistration of dissimilar materials) that should be visible in case of Eddy current distortion was almost entirely absent in the TRSE images. To further support our decision, we used FSL-MCFLIRT (31) to provide the affine transformation matrix (i.e., 12 degrees of freedom) to detect the scaling and skewing parameters, which might occur due to induced eddy current where the affine transformation matrix can be written as:

Frontiers in Neurology | Brain Imaging Methods
where R is the rotation matrix, s x , s y , s z ∈ R are the scaling parameters, a, b, c ∈ R are the skewing parameters and t x , t y , t z ∈ R are the translation parameters. Table S2 in Supplementary Material reports the average and standard deviation (along all gradient directions per dataset) of the estimated transformation parameters where all datasets tend to have a unit scale with minimal skewing values. These values confirm the decision of bypassing eddy current compensation in our analysis.
were resampled after initial filtering to reduce the effect of noise. Further, we tested motion correction based on denoised HARDI sequences using the Joint Rician LMMSE filter (43) implemented as part of 3D Slicer (www.slicer.org), and found that the quantified motion with and without noise reduction was very similar.
To quantify motion, we used the magnitude of the translation vector (in millimeters) as well as the axis-angle rotation representation (in degrees) (4). The boxplots in Figure 2 show the rotational and translational components of the motion being detected from a total of 24 DWI datasets, showing an average of 0.39°rotation and 0.61 mm translation. The graphs in Figure 2 illustrate the arbitrariness of a common calculation of percentage of motion correction to determine the number of affected scans, here shown as a function of thresholding on the estimated motion parameters. While this experiment attributes the estimated rotation and translation parameters to actual subject motion, a part of the experimentally obtained parameters may be due to some imaging/image-processing uncertainty and also to image differences due to anatomical properties of the object (e.g., tissue orientation) that make the images"look"different even if they were perfectly aligned. To backup our analysis, we conducted another experiment where we contaminated a single DWI dataset with two independent realizations of Rician noise such that the two generated DWI images were perfectly aligned because they were the exact same image. Then, we ran motion correction where all DWI images were aligned to the same baseline, we obtain similar motion parameters although we are registering two independent acquisitions of the same subject. We therefore conclude that the transformation parameter estimates from FSL-MCFLIRT (31) are resilient to noise and may primarily caused by subject motion during a DWI scan, or eventually also by relative motion between subject and scans if considering artifacts due to pulse sequence and scanner technology.

LIVING PHANTOM: ACQUISITION AND GOLD STANDARD GENERATION
Unlike conventional MRI, where realistic phantoms exist (44), there is no widely acceptable realistic DWI phantom for the www.frontiersin.org assessment of different processing tasks (43,45). Existing phantoms simulate crossing sections in two and three dimensions, but they are not representative of white matter complex architecture with multiple fiber crossing, bending, and branching. The lack of realistic phantoms motivates us to base our analysis on living (human) phantoms being scanned under well-controlled environments and propose a HARDI-based QC to yield motion-and noise-free datasets. Acquired DWIs were preprocessed (refer to Figure 1A) to obtain nearly noise-free and motion-free datasets according to the following pipeline, and therefore, to be used as a gold standard for reconstruction and tractography.

HARDI-based quality control
The QC process starts with identifying individual volumes having fast bulk (intra/within-gradient) motion using the signal dropout score proposed in Ref. (14). The score was computed for each slice in each volume, where slices with a score >1 were considered to have suspect signal dropout. Based on a zero-tolerance strategy, any volume having at least one slice with signal dropout was excluded from further analysis. It is worth noting that no within-gradient motion was detected in our phantom acquisitions. Each gradient was then independently denoised to reduce noise using the Rician LMMSE estimator with an 11 × 11 neighborhood (46) implemented in 3D Slicer (www.slicer.org) where the noise parameter is automatically estimated. Using DTIPrep (1), interslice brightness artifacts were detected via normalized correlation analysis between successive slices within a single DWI volume, where corrupted gradients were excluded before being streamed into the next steps. Further, interlaced correlation analysis (1) was used for detection and removal of Venetian blind artifacts (seen when motion occurs between the interleaved parts of an individual gradient volume) and fast bulk motion within a single DWI volume, where no quantitative within-gradient motion was detected.
For each DW-MRI scan, iterative FSL-MCFLIRT (20) was used to correct for intergradient subtle motion (<1°rotation and < 0.8 mm translation), with the baseline volume as the reference for rigid alignment (i.e., six degrees of freedom with normalized mutual information as the cost function). The corresponding diffusion-weighting gradient vectors were reoriented accordingly (37). To palliate the effect of spatial intensity inhomogeneities, N4 correction (47) was performed where the bias field was computed from the baseline volume and subsequently applied to all diffusion-weighted images. For further noise reduction, the Joint LMMSE (43) (www.slicer.org) was used to exploit the joint information from neighboring gradients from motioncorrected sequences. To avoid over-blurring, we used a 2 × 2 × 2 neighborhood with six neighboring gradients.

Atlas-guided parcelation
For automated tractography selection and the quantification of whole brain connectivity, we defined a subject-specific unbiased atlas via DTI-derived data from HARDI sequences belonging to the same subject/phantom. This results in a tensor atlas, where we can define a detailed parcelation of neuroanatomical structures, and map it back to each raw scan. This reduces registration variability between each phantom data when defining the parcelation in subject spaces. The full process entails atlas creation and parcelation definition, as detailed in the following.

Co-registration and atlas building.
To define a common reference space, our framework is centered around the creation of a DTI atlas, generated as an unbiased average atlas from the study dataset via a deformable atlas building strategy. Unbiased atlas building is used to provide one-to-one mapping between the image data and the template atlas, wherein the atlas is built from the population of data as the centered image with the smallest deformation distances. The overall registration framework, similar to what has been presented in Ref. (48), proceeds in four steps: (1) image preprocessing via skull-stripping and tensor estimation, (2) affine alignment, (3) unbiased diffeomorphic atlas computation via GreedyAtlas module in AtlasWerks 3 software (49), and (4) a refinement step via symmetric diffeomorphic registration using the advanced normalization tools -ANTS (32).
Image preprocessing. A brain masking is first performed on the baseline images using FSL-BET2 (brain extraction tool) (50) to remove all non-brain parts of the image. BET2 uses a surface model approach to robustly and accurately carry out the segmentation. We then model tensors using the brain masks from the initial DWI datasets by using weighted least squares estimation, and then extract related scalar maps such as fractional anisotropy (FA) images.
Affine alignment. The second step applies affine registration of baseline images to a previously defined baseline template. A multithreaded, coarse-to-fine registration scheme using mattes mutual information metric is employed in that regard (33). The transformations are applied to curvature FA maps. The use of curvature FA as feature to derive registration has initially been presented by Goodlett et al. (51). It is defined as the maximum eigenvalue of the Hessian of the FA image, therefore measuring image intensity curvature (second derivative) in the direction of largest curvature, which acts like a 3D ridge detector. It is computed by convolution of the FA image with a set of Gaussian second derivatives with a fixed aperture, proportional to the size of the white matter structures. The curvature feature image proved to be an efficient detector of the 3D manifold skeleton of major fiber bundles, which occur as tubular or sheet-like thin structures (similarly to the TBSS software), with the strongest response at their center. It is thus commonly used by our group when building population atlases to optimize correspondence of fiber tract geometries, and integrated into our freely distributed software package (48). The curvature FA maps are thus mapped to this template space, and then the intensity is rescaled via histogram matching.
Atlas building. We then use an unbiased deformable atlasbuilding procedure (52) that applies large deformation diffeomorphic metric mapping transformations to these intensity rescaled mapped curvature FA images. The procedure relates individual datasets to the subject-specific atlas template space by means of non-linear, invertible transformation. Tensor maps are transformed into the atlas space with tensor reorientation by the finite strain approach (53), taking into account both affine transformation and non-linear deformation. The transformed tensor images are finally averaged using the Riemannian framework proposed in Fletcher and Joshi (54), resulting in a three dimensional average tensor atlas.
Atlas refinement. An additional step is performed by direct symmetric diffeomorphic registration of initial FA images to the previously created DTI-FA atlas via the Advanced Normalization Tools -ANTS (32). In our experience, this dual stage procedure has been shown to produce a sharper atlas with improved registration accuracy, most likely attributable to the use of local normalized cross-correlation as the image similarity metric. Final affine transformation and deformation fields are then available from subject space to atlas space.

White matter parcelation.
We used the publicly available JHU-DTI-SS (a.k.a. "Eve") atlas described in Oishi et al. (55). Defined as a single subject template, it includes both structural (T1w, T2w) and DTI images with white matter map parcelations, defining 176 hand-segmented core and peripheral regions of interest (ROIs). A multithreaded, coarse-to-fine diffeomorphic registration scheme using the cross-correlation metric via ANTS is employed on FA images between the Eve atlas and the subjectspecific atlas. The computed deformation field is then applied to the Eve white matter label map. We can then map the parcelation, now defined in our subject atlas space, back to raw data in the initial image space, via the use of previously computed displacement fields. On a specific note, we concatenated the transformations from Eve atlas space to our initial images in order to directly map the parcelation and avoid the use of multiple interpolations. The white matter parcelation map is then defined both in the subject-specific atlas space and in each individual subject space.

Human motion simulation
As a pilot study, one human phantom was asked to be rescanned with his head tilted to simulate noticeable motion. The two datasets, after being QCed (see 1), were then used to construct motion-corrupted sequences (see Figure 1B). Based on the alignment of the baseline images of the two scans (original and tilted) using FSL-MCFLIRT, about 12°of rotation and 7 mm of translation were detected, whereas <1°of rotation and 0.8 mm of translation were found when aligning individual DWIs to their corresponding baseline image. It is worth noting that the quantified motion between the acquired datasets (i.e., untilted versus tilted brains) can be classified as severe subject motion (36). We then arbitrarily considered the first out of the two scans as the "motion-free" sequence and used it as a reference for performance evaluation of different motion correction schemes. A random percentage of DW images (10, 30, 50, 70, and 90%, each with five distinct random sets of gradient directions) drawn from the second scan (tilted brain) were mixed with the first scan to construct 25 motion-corrupted datasets. Noisy sequences were generated by simulating Rician noise based on seven levels of SNRs from 4 to 20 (56), yielding 175 (5 experiments × 5 corruption percentages × 7 SNR levels) sequences.

Motion correction schemes
Correction for subject motion involves four main decision variables (see Figure 1C), where each distinct combination of choices defines a motion correction scheme. The first variable is which reference volume is to be used in the alignment process. Two options are available (20): baseline-based [e.g., Ref. (16)] and model-based [e.g., Ref. (35,36)]. In this context, we use the FMAM (Fit Model to All Measurements) method (35) where target images for registration were generated by first fitting the diffusion tensor to the DWIs, followed by diffusion simulation to provide target images of similar contrast to the DWIs. Notice that with >50% motion corrupted, model-based reconstruction infers the spatial position/orientation from the gradients corresponding to the tilted brain due to its majority (i.e., gradients of the untilted brain are considered the motion-corrupted directions). Therefore, with model-based correction for sequences having more than 50% corrupted directions, the tilted brain was used as a reference for performance evaluation.
The second variable denotes whether the correction is performed based on raw or denoised DWIs, where the denoising process should not take into account joint information between diffusion gradients due to motion corruption. In our experiments, we denoised motion-corrupted sequences using the Rician LMMSE estimator (46), where each gradient was independently denoised.
The third variable entails the mode of correction, i.e., registration-based versus outlier-based. The first choice explores two options: (1) only aligning and interpolating the corrupted gradient directions to mimic the situation where a predefined motion parameter threshold is used to claim whether a DWI volume is motion-corrupted, (2) assuming there is always motion, forcing the alignment and interpolation of all DWI volumes. Note that both options involve the reorientation of the diffusion gradient vectors corresponding to the corrupted volumes (37) to incorporate the rotational component of subject motion. In the second choice, i.e., outlier-based, we mimic the motion scrubbing approach, where we exclude the affected gradient directions from subsequent computations (i.e., diffusion profile reconstruction and tractography). Eventually, the interpolation step in the registration-based choices introduces the fourth variable where we study the impact of using trilinear and sinc interpolants.
It is important to stress that, in our motion simulation paradigm (i.e., randomly mixing DW volumes from a tilted-brain dataset), the identity of the motion-corrupted directions is known a priori without any use of parameters. This prior information is used via the outlier-based correction, as well as the interpolate corrupted directions choices. Nonetheless, in practice, this a priori information corresponds to heuristically set thresholds on the estimated motion parameters beyond which volumes are claimed to be corrupted/outliers. For example, a rotation threshold of 0.5°a nd a translation threshold of about one voxel spacing are set by default in DTIprep (1). www.frontiersin.org

RECONSTRUCTION AND TRACTOGRAPHY
The reconstruction and whole brain tractography were computed for the motion corrected sequences as well as the motion-free sequences [gold standard generated in Section 2, followed by automatic tractography selection for seven major fiber bundles (see Figure 1D)].
We employed the constrained spherical deconvolution (CSD) technique (57) to reconstruct fiber orientation distributions functions (fODFs) from the DWI data using the diffusion imaging Python (DiPy) library (58). The fiber response function was estimated from the corpus callosum region, defined by the white matter parcelation (see 2), where it is known to have single fibers. In particular, we used an ROI at the center of the corpus callosum and of a radius that would include all its voxels. The response function was estimated in that region from the voxels with FA higher than 0.7.
Part of our analysis is based on comparing brain connectivity graphs, which are represented as weighted graphs and computed from fiber tractography results. Whole brain tractography was performed using the EuDX deterministic tracking technique (59), which is implemented in the DiPy library (58), using random seeding inside the brain region and a turning-angle threshold of 30°b etween two connected voxels [as suggested by Parizel et al. (60) to provide sufficient fiber density while minimizing the number of spurious tracts].
To extract brain connectivity graphs from the fiber tractography results, we used the 176 core and peripheral ROIs defined in the white matter parcelation (see 2). Let N ij denote the total number of streamlines connecting the i-th and j-th ROIs, each with length l ij k ∀k ∈ [1, N ij ], and the edge weights w ij computed as follows (61): . The normalization by the tracts length gives a higher connection strength to short tracts to compensate for the signal attenuation as a function of tract length. It is worth noting that the concept of using the connection strength or other measures to weight the graph edges was previously discussed in several papers [e.g., Ref. (62,63)].
For tract-based analysis, an automatic tractography selection method was performed to select a subset of detected tracts from the whole brain tractography result corresponding to a specific white matter structure. Starting from the Eve-atlas-based white matter parcelation map defined in the subject space (see 2), the pass-through and not-pass-through volumes of seven fundamental fiber bundles (left and right hemispheres) were defined. To remove fibers that do not belong to the pathway of interest, we used the geometrical constraints specific for different fiber bundles as defined in Ref. (64), where the anatomical characteristics of these fiber bundles are defined in Ref. (65). We report the matching results from seven major fiber bundles: corpus callosum (CC), cingulum of the cingulate gyrus (CG), corticospinal tract (CST), fornix (FX), inferior fronto-occipital tract (IFO), inferior longitudinal fasciculus (ILF), and uncinate fasciculus (UNC).

MOTION CORRECTION CONSEQUENCES: EVALUATION METRICS
The influence of various motion correction choices on subsequent reconstruction and tractography is evaluated according to voxel-based, global connectivity-based as well as tract-based metrics (see Figure 1E), detailed as follows.

Voxel-based metrics
In order to measure similarities between the original motion-free fODFs and the fODFs corresponding to the motion corrected images, we use the Jensen-Shannon divergence (JSD), which has been used to quantify differences between ODFs in various studies, e.g., Ref. (66,67). Given two probability distributions P and Q, the JSD metric is defined as follows: where M = (P + Q)/2 and D KL is the Kullback-Leibler divergence. In our case, P and Q are represented as discrete distributions; therefore, the KL divergence takes the following form: where i is the discrete sample index. The JSD is for PDFs, but we compute it for normalized fODFs. We believe that it is a good measure since it reveals subtle changes in PDFs so we can also keep track of changes in fiber volumes as well as orientations.
In addition to comparing fODFs, we are interested in quantifying local deviations in fiber orientations due to motion correction. Since brain connectivity maps are inferred by tracking local fiber orientations extracted from fODFs, distortions in those directions may lead to unreliable brain connectivity maps. Therefore, it is important to study the impact of motion correction on fiber orientations by directly comparing the local fiber orientations before and after correction. To that end, we use the mean angular deviation measure θ defined as follows: where N is the number of fibers compared, and v k i and v k j correspond to the orientations of fiber k, with and without motion correction. Before averaging the deviations, we match the fibers, such that fiber j has the closest direction to fiber i. If the number of fibers is different, we compare the fibers that are present in both voxels. For example, if we have three fibers after motion correction, whereas before correction there were only two, we compare the two closest fiber directions. The fiber orientations were computed using the DiPy peak extraction tool (with 0.4 relative peak threshold and 20°minimum separation angle). We allowed up to five orientations in each voxel (N = 5). Since general image transformation does not necessarily preserve the original ordering of the fiber orientations, we first match the fibers based on the angular distance between each pair before computing the mean deviation.

Global connectivity-based metric
Once the brain connectivity graphs were generated for the different sequences, we compared them by means of the graph diffusion distance (GDD) metric, which has been proposed in Ref. (68). The GDD is a novel distance measure for comparing weighted graphs, which takes into account the graph structure in addition to the edge weights, compared to the Frobenius norm, which is sensitive Frontiers in Neurology | Brain Imaging Methods only to the edge weights. For an explanation of the differences between the GDD and the Frobenius norm, see the Barbell graph example in Ref. (68).
The GDD is based on the diffusion maps framework (69). Let A 1 and A 2 be weighted adjacency matrices for N vertices, that is, A 1 and A 2 are symmetric, non-negative, N × N real matrices with zeros along the principle diagonal. The (unnormalized) graph Laplacian operator is defined by L n = D n − A n (for n = 1, 2), where D n is a diagonal degree matrix for the adjacency A n , i.e., Given adjacency matrices A 1 and A 2 , the columns of the Laplacian exponential kernels, exp(-tL 1 ) and exp(-tL 2 ), describe the different diffusion patterns centered at each vertex generated by diffusion up to time t under the two sets of weighted edges. Measuring the sum of squared differences between these patterns, summed over all the vertices, yields where ||·|| F is the matrix Frobenius norm. This defines a family of distance measures ξ, depending on the information propagation time t. The graph diffusion distance is given by ξ at the time of maximal difference, i.e., d gdd (A 1 , Here, we compute d gdd (A 1 , A 2 ) by first diagonalizing L 1 and L 2 and using the exponential mapping. Then, Equation (3) allows the computation of ξ(A 1 , A 2 ; t ) for any fixed t. Finally, we optimize over t by a line search to give d gdd (A 1 ,A 2 ).

Tract-based metric
The spatial matching between motion-free and motion-corrected tracts was examined using Cohen's Kappa statistic (70). The streamlines for a specific fiber tract (e.g., CST, IFO.) are first converted to a binary volume with the same dimension and spacing of the raw DWI, where voxels that were occupied by at least one streamline were assigned a value 1. The two tracking results to be matched were then superimposed to identify: (1) voxels that did not contain streamlines in either result (NN), (2) voxels that contain streamlines in both results (PP) and (3) voxel that contain streamlines in one of the results (PN or NP) 4 . The Kappa statistic measures the level of agreement of the tracking results and corrects for agreement expected by chance. Hence, Kappa is computed based on the probability of agreement P(a) and the probability of expected agreement due to chance P(e) as (71),

RESULTS
The fODFs and the whole brain tractography were computed for the 3,150 motion corrected sequences (175 datasets × 18 correction schemes), as well as the motion-free sequences, followed by automatic tractography selection for seven major fiberbundles.

VOXEL-BASED METRICS
The average JSD metric was computed using the fODF reconstruction from the "motion-free" dataset, not corrupted by mixing DWI directions from the tilted-brain scan, as a reference (i.e., presenting only subtle motion inherent to a scan). We differentiated between regions where multiple fibers were detected versus single fiber regions. Figure 3 shows the average JSD values for single and multiple fiber regions as a function of motion corrupted percentage for different SNR levels and as a function of SNR levels for different motion corrupted percentages. Figure 4 illustrates sample reconstructions from motion-free versus motion-corrected datasets for different corrupted percentages and different motion correction choices. Table 1 shows the effect of the denoising process prior to applying motion correction on the average JSD values for single and multiple fiber regions as a function of SNR levels for different motion corrupted percentages. Figure 5 shows the average deviation of local fiber orientations (for the first two dominant detected fibers per voxel) as a function of motion corrupted percentage, as well as SNR levels. Figure 6 shows the average graph diffusion distance (GDD) metric as function of both the corrupted directions percentage and the SNR levels. The metric compares the weighted connectivity graphs from the whole brain tractography of the "motion-free" dataset to that of the motion-corrected datasets. It is worth noting that the tractography of the tilted brain dataset is used as a reference for model-based corrections when the corrupted percentage exceeds 50%. Figure 7 visualizes the brain connectivity being represented circularly using the Circos software (72) where the parcelated structures (refer to Table S1 in Supplementary Material for their full names) are displayed on a connectogram representing left and right hemispheres symmetrically positioned along the vertical axis. The weighted connectivity matrix computed as described in Section 4 was normalized to attain a unit maximum. Each entry in the normalized connectivity matrix corresponds to an interregion link with thickness proportional to the entry weight. To avoid dense visualization, all entries with weight <0.15 were discarded. Table 2 shows the average Cohen's Kappa statistic computed for corpus callosum (CC), corticospinal tract (CST), and inferior fronto-occipital tract (IFO) (where other pathways showed similar trend) based on automatic tractography selection using whole brain tractography of raw datasets (denoised datasets showed similar trends due to the robust fODF estimation, yet their graphs were omitted due to space limitation). Figures 8-12 show sample tractography selections for the aforementioned tracts from the untilted motion-free dataset as well as selections from motion-corrected datasets with different corrupted gradient directions. Correction www.frontiersin.org

FIGURE 3 | The average Jensen-Shannon divergence (JSD) values (lower is better) for reconstructions based on raw datasets (denoised ones share similar performance) as (A) a function of motion corrupted percentage for different SNR levels and (B) a function of SNR levels for different motion corrupted percentage.
The first and third columns show JSDs single fiber regions while the second and fourth columns show such values for reconstructions based on multiple fiber regions. Notice the impact of motion scrubbing (removing gradient directions), which becomes more significant with more motion-corrupted directions when compared to registration-based correction. Further the impact of motion scrubbing is rendered evident for 10% corrupted gradients.  choices shown include outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). Also pass-through (in green) and not-pass-through (in red) volumes (i.e., isosurfaces) are shown. Their definitions along with the geometric constraints employed to remove fibers, which do not belong to the pathway of interest, can be found in Ref. (64).

DISCUSSION
In this section, we discuss the impact of different motion correction choices using local as well as global metrics.

VOXEL-BASED RESULTS
Heterogeneous regions are more affected by motion correction, showing larger average JSD in general when compared to the single fiber regions, regardless of the correction mode, interpolation scheme, or reference volume employed (see Figure 3). The impact of motion scrubbing (removing gradient directions) becomes more pronounced with more motion-corrupted directions when compared to registration-based correction (see Figure 3A). Meanwhile, the JSD values indicate minimal deformations in fODFs reconstructed for baseline-based correction at high SNR levels compared to model-based correction, whereas both choices show comparable average JSD values at low SNR levels. This complies with the conclusions presented in Ref. (20).
Forcing the correction and interpolation of all gradient directions shows comparable performance compared to the correction and interpolation of only the corrupted directions (see Figure 3A). This observation discourages the choice of heuristic parameters on motion parameters beyond which directions are claimed to be corrupted and interpolated. Further, interpolation of all directions causes less impact on the reconstructed fODFs at low corrupted percentages (<50%). We can assume, therefore, that motion is omnipresent and can be corrected for by the alignment and interpolation of all gradient directions.
On the interpolation aspect of correction, the sampling theory suggests the sinc kernel as the ideal interpolation kernel; nonetheless, this gives rise to the Gibbs phenomena (i.e., ringing) due to kernel truncation. This explains the smaller fODF deformation when using trilinear interpolation compared to sinc interpolation.

FIGURE 5 | The average fiber orientation deviation (lower is better) for reconstructions based on raw datasets (denoised ones share similar performance) as (A) a function of motion corrupted percentage for different SNR levels and (B) a function of SNR levels for different motion corrupted percentage. The first and third columns show orientation deviation
for the first detected fiber having the largest volume fraction while the second and fourth columns show such values for the second detected fiber having the second largest volume fraction. Notice that local fiber orientations are more affected by motion scrubbing as SNR decreases and/or corrupted directions increase.
Trilinear interpolation, which is much faster, is probably sufficient for motion correction.
In Figure 3B, one can observe the comparable impact of different motion correction choices at low motion corruption percentages (<30%). Whereas with higher motion corruption, a situation that is encountered in studies including infants, for example, motion scrubbing shows a significant impact on the reconstructed fODFs even at high SNR levels. This effect is more pronounced in regions with crossing fibers where the ability to resolve fiber crossings is deteriorated especially as the separation angle of the fibers decreases.
Further, baseline-based motion corrections show minimal JSD values with higher corruption levels (> 50%) when compared to model-based corrections, regardless of the interpolation scheme employed. The difference in performance between baseline-based and model-based becomes more significant as the SNR level increases.
The denoising process yields smaller JSD values for low SNR levels (<12) (see Table 1), while providing comparable performance for baseline-based and model-based motion correction choices. The slight decrease of JSD values for denoised datasets compared to the raw ones is due to the fODF reconstruction processes where we use the constrained spherical deconvolution (CSD) technique (57). In an iterative manner, the deconvolution process in CSD applies a non-negativity constraint on the estimated fODFs as negative fiber orientation densities are physically impossible. This process provides fODFs estimates that preserve the angular resolution while being robust to noise. Yet, as a word of caution, the denoising process, when applied to motion-corrupted datasets, should not take into consideration the joint information from diffusion gradients since voxelwise correspondence between different diffusion volumes is not guaranteed.
In Figure 4, one can observe the significant impact of motion scrubbing (i.e., outlier-based correction) on the reconstructed fODFs for mildly corrupted datasets (e.g., 30% corrupted directions). Further, it can be noticed that with >50% motion corruption, model-based reconstruction infers the spatial position from the gradients corresponding to the tilted brain due to its majority (i.e., gradients of the untilted brain are considered the motion-corrupted directions).
Due to the insufficient number of gradients and unbalanced sampling of the q-space, the impact of motion scrubbing on the estimated fiber orientations becomes evident as SNR decreases and/or corrupted directions increase (see Figure 5).
Although interpolating all directions versus corrupted directions reports comparable orientation deviation with lower impact on fractionally corrupted datasets (<50%), we still favor forcing www.frontiersin.org

FIGURE 6 | The average graph diffusion distance (GDD) (lower is better) for the whole brain tractography derived from the raw datasets (denoised ones share similar performance) as (A) a function of the corrupted directions percentage for different SNR levels and (B) a function of SNR levels for different motion corrupted percentages.
Notice the different behavior displayed by motion scrubbing for ≥50% corrupted directions, which due to having more short tracts connecting nearby region of interests while being assigned to larger weights in the graph construction step.
such a process to all directions to avoid the ad hoc process of thresholding motion parameters.
Nonetheless, one can notice the peaked performance of the orientation deviation at 50% corrupted directions for model-based motion correction choices. The explanation of this phenomenon is based on the fact that, with >50% of the gradients being corrupted (i.e., corresponding to the tilted brain), the formed reference volumes would instead infer its anatomical structure from the tilted brain. For highly corrupted datasets, the gradients corresponding to the untilted brains become the corrupted directions (i.e., a 70% corruption will have a performance similar to the 30% case).
Model-based corrections display higher impact on the JSD of the reconstructed fODFs at higher levels of motion corruption, but such corrections have a smaller impact on the fiber orientation deviations especially when interpolating all directions (trilinear interpolant). This change of JSD metric implies an increase in the overall fODF volume when compared to the reconstructions from the motion-free dataset, yet the fODFs maintain the voxel-wise fiber crossing structure. This observation is more pronounced for fibers with the largest fiber volume fraction.

GLOBAL CONNECTIVITY-BASED RESULTS
Whereas there is a slight performance difference between GDD values computed based on raw datasets versus those from denoised dataset, thanks to the fODF reconstruction that is robust to noise contamination, one may observe consistent findings when GDD is Frontiers in Neurology | Brain Imaging Methods FIGURE 7 | Sample reconstructed connectomic profile (i.e., connectogram) from untilted and tilted motion-free datasets as well as connectograms from motion-corrected datasets with 10, 30, and 70% corrupted gradient directions. Correction choices shown include outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). Notice the tendency of motion scrubbing to add more links between nearby ROIs at corruption percentages, implying the detection of more short tracts.
compared to the JSD metric. In particular, the global brain connectivity is least affected by the motion correction step when forcing the alignment and interpolation of all gradient directions without setting a predefined threshold to claim corrupted volumes. There is a significant difference between GDD values obtained from trilinear interpolation compared to sinc interpolation. This implies that the impact of sinc interpolation on the fODFs, being encoded by the JSD metric, yields global brain connectivity that is different from the "motion-free"-based brain connectivity.
Whereas the effect of motion correction is evident at higher corrupted percentages (except for motion scrubbing), one can notice the effect of noise where the impact of motion correction becomes more significant at low SNRs (<12), while different correction choices (except motion scrubbing) render slight performance difference at high SNRs (>12). Moreover, being consistent with different SNR levels, the baseline-based correction choices yield connectivity graphs with minimal deviations (smaller GDD) compared to their corresponding model-based choices.
On the contrary, motion scrubbing displays a different behavior. The GDD values from the scrubbed datasets, though maximal compared to the other correction choices, are decreasing with higher SNR levels for <50% corrupted directions, but this behavior is soon changed to the opposition direction for ≥50% corrupted directions (see Figure 6B). This change of behavior is perceivable in Figure 6A where the GDD values are maximal at 50% corruption percentage for high SNR levels (>12), whereas such a peak occurs even at low corrupted percentages (e.g., 30%) for low SNRs (<12). This phenomenon can be explained as follows: with high percentage of motion-contaminated gradients, the scrubbing (outlier-based) option tends to produce an inadequate set of gradients for accurate fODF estimation due to the exclusion of too many gradients. This unbalanced sampling of the q-space, henceforth, biases the CSD process to converge to an incorrect solution, producing inaccurate fiber orientation, and in turn imprecise brain connectivity. Hence, the increase of the GDD values with higher SNRs beyond 30% corrupted directions is due www.frontiersin.org  to having more short tracts connecting nearby region of interests while being assigned to larger weights in the graph construction step (see 4). In Figure 7, one can observe the motion scrubbing behavior where the links become denser with higher corrupted percentages, implying the detection of more short tracts connecting nearby ROIs. On the other hand, the baseline-based choice reveals comparable connectograms to the motion-free ones while model-based counterpart tends to add more shorter tracts.

TRACT-BASED RESULTS
Being consistent with the results from the other metrics, motion scrubbing shows a significant decrease in the degree of tract agreement when increasing the percentage of motion corruption, which in turn leads to discarding more gradient directions.
With <50% corrupted directions, the tract agreement degree increases with higher SNR levels, yet such a trend changes with ≥50% where shorter or no tracts being detected, which deviates from being anatomically realistic; see, for example, the top row of Figures 8-12 where tracts can be even missed even at 70% corruption. The CST and IFO tracts are good examples of long tracts that are not recovered by motion scrubbing beyond 10% motion corruption, see Figures 10 and 11. Nonetheless, the maximal agreement is achieved when aligning and interpolating all gradient directions to correct for motion regardless of the reference volume used in the registration process (i.e., baseline versus model-based). It can be observed in Figures 8-12 that modelbased motion correction is able to recover longer tracts at high corruption percentages compared to the baseline-based motion correction.
www.frontiersin.org FIGURE 8 | Sample tractography selection for the corpus callosum (CC) from the untilted motion-free dataset as well as selections from motion-corrected datasets with 10, 30, and 70% corrupted gradient directions. Correction choices shown include outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). One can observe the short tracts being detected by motion scrubbing at high corruption percentages due to the exclusion of too many gradient directions. outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). Notice the inability of motion scrubbing to detect an anatomically realized CG at high corrupted percentages. outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). Note that motion scrubbing cannot recover long tracts such as CST beyond 10% motion corruption. FIGURE 11 | Sample tractography selection for the inferior fronto-occipital tract (IFO) from the untilted motion-free dataset as well as selections from motion-corrected datasets with 10, 30, and 70% corrupted gradient directions. Correction choices shown include outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). Note that motion scrubbing cannot recover long tracts such as IFO beyond 10% motion corruption. Further, motion-based motion correction tends to recover longer tracts at high motion corruption compared to baseline-based correction.

Frontiers in Neurology | Brain Imaging Methods
www.frontiersin.org FIGURE 12 | Sample tractography selection for the uncinate fasciculus (UNC) from the untilted motion-free dataset as well as selections from motion-corrected datasets with 10, 30, and 70% corrupted gradient directions. Correction choices shown include outlier-based (i.e., motion scrubbing) and registration-based (using baseline and model-based reference volumes). Notice the inaccurate UNC tract being detected from the motion scrubbing choice at high percentages of motion corruption.

CONCLUSION: GUIDELINES FOR MOTION CORRECTION IN HARDI ACQUISITIONS
Although there is excellent theoretical work on DWI acquisition parameters and ODF reconstruction schemes, as well as their effects on the quality and crossing fiber resolution, standard users lack clear guidelines and recommendations on the best ways to approach and correct for motion in practical settings. This work investigated motion correction using transformation and interpolation of affected DWI directions versus the exclusion of subsets of DWIs, and its impact on the reconstructed fODFs, local fiber orientations, brain connectivity, and detection of fiber tracts. The various effects were systematically explored and illustrated via living phantom data, leading to the general conclusion that motion, even subtle, exists in every acquired DW scan and special care is needed to correct for motion. In the following, we summarize the findings of our analysis, which might serve as guidelines for users in practice: -Although least recommended, motion scrubbing (removing corrupted gradient directions) can be used in studies with wellcontrolled environments and involving not-in-pain adults or sedated subjects, where minimal subject motion is anticipated (i.e., <10%motion corruption). Yet, this gradient removal should not result in unbalanced sampling of the q-space since the gradient distribution should be as uniform as possible on the sphere.
-Voxel-wise reconstructions, tractography, and global brain connectivity are least affected by the motion correction step when forcing the alignment and interpolation of all gradient directions without setting predefined thresholds to claim corrupted volumes. -Using voxel-wise reconstructions that are robust to noise, the denoising process can be considered unnecessary prior to applying motion correction. Nonetheless, if applied, the denoising algorithms should not take into account joint information from different diffusion gradients since voxel-wise correspondence is not guaranteed. -Baseline-based correction choices can be used in studies involving voxel-wise scalars, which depend on the volume of the reconstructed ODFs, especially with highly motion-corrupted datasets. -Model-based correction choices, on the other hand, are recommended for studies requiring the recovery and analysis of long tracts, e.g., CST and IFO, especially with highly motion-corrupted datasets. -Trilinear interpolation, although much faster compared to sinc, is probably sufficient for motion correction, where the global brain connectivity is least affected.
One may wonder that using a gold standard, which was obtained by motion correction (among other QC steps) using some of the methods under investigation could raise questions on reliability of the conclusions presented. Hence, in order to support Frontiers in Neurology | Brain Imaging Methods the validity of the conclusions drawn from this study, we conducted the same set of experiments using the raw acquired data without performing any quality control. Figure 13 shows a sample result of the average JSD and local fiber orientation deviation metric for reconstructions based on gold standards generated from the QCed phantom datasets as well as the raw phantom datasets. Being consistent with the conclusions drawn from the reconstructions based on the QCed datasets, regions with crossing fibers are more affected by motion correction, showing larger average JSD in general when compared to the single fiber regions. The impact of motion scrubbing becomes more evident with more motioncorrupted directions when compared to the registration-based correction. Moreover, the peaked performance of the orientation deviation at 50% corrupted directions for model-based motion correction is also maintained. Further, forcing the interpolation of all gradients directions would have minimal impact on the reconstructions when compared to the choice of interpolating motion corrupted directions via setting a predefined threshold beyond which a direction is claimed to be corrupted.

LIMITATIONS AND FUTURE WORK
The primary message of this paper is that care should be taken in deciding the processing pipeline for any DW-MRI (esp. HARDI) at hand, this involves, for example, the acquisition protocol (i.e., less redundant gradients would discourage the choice of motion scrubbing) and the participating subjects (i.e., elderly in pain, infants, unsedated subjects versus healthy adults where variable motion severity levels are anticipated). Nonetheless, the presented analysis attains some limitations, which can be outlined as follows: • One-subject analysis: as a controlled motion experiment, we could use a scan session of subjects with repeated scans where the second shows bulk motion relative to the first one. The existing phantom data contain repeated scans taken in different sessions within 24 h and hence they have to be seen as independent scans for the same subject. As a pilot study, we therefore asked one healthy volunteer to be scanned twice in a single scan session while tilting the head between the two scans. This enables us to mix gradients between the two scans from the same subject; this cannot be done with the existing repeated independent scans. We understand that reporting our results with more than a pair of datasets (tilted and untilted brain) would support our analysis, and we will collect more scans with this experimental design in our future annual phantom scan sessions. Nonetheless, we think that this experiment, even with its limitations, contributes to establish an experimental framework that would guide the scientific community in systematically evaluating the outcomes of different preprocessing steps. In the future, we will prospectively plan to obtain more of such datasets, also including navigator shots for estimation of rotation, to extend this analysis. www.frontiersin.org • Anatomical geometric correction: echo-planner imaging (EPI) distortion, in contrast to Eddy current that affects only diffusionweighted images, would affect all images in the acquired sequence regardless of their level of diffusion sensitization. Hence, EPI distortion correction would involve acquiring additional data for either B0 mapping or a dedicated T1 or T2weighted structural target. That's a primary reason behind ignoring EPI correction in most MRI processing pipelines (73). With the availability of such additional data, EPI correction would involve non-linear spatial warping that employ interpolation, a decision variable under investigation of the presented work. Hence, we favored to bypass this step in order not to intermingle interpolation due to motion correction and that of EPI correction. However, we think that the analysis/correction of inter-gradient spatial distortions, and its effect on ODF reconstruction, is an important issue, which we together with the scientific community need to address. • Better gold standard generation: the living phantoms were healthy volunteers who were aware of the whole process and were keen to remain without motion. Nonetheless, the investigation of prospective navigators is a promising idea for future work to provide different types of ground truth data and to get motion estimates directly from the scanner rather than only via post-processing.