Evaluation of Six Phase Encoding Based Susceptibility Distortion Correction Methods for Diffusion MRI

Purpose: Susceptibility distortions impact diffusion MRI data analysis and is typically corrected during preprocessing. Correction strategies involve three classes of methods: registration to a structural image, the use of a fieldmap, or the use of images acquired with opposing phase encoding directions. It has been demonstrated that phase encoding based methods outperform the other two classes, but unfortunately, the choice of which phase encoding based method to use is still an open question due to the absence of any systematic comparisons. Methods: In this paper we quantitatively evaluated six popular phase encoding based methods for correcting susceptibility distortions in diffusion MRI data. We employed a framework that allows for the simulation of realistic diffusion MRI data with susceptibility distortions. We evaluated the ability for methods to correct distortions by comparing the corrected data with the ground truth. Four diffusion tensor metrics (FA, MD, eigenvalues and eigenvectors) were calculated from the corrected data and compared with the ground truth. We also validated two popular indirect metrics using both simulated data and real data. The two indirect metrics are the difference between the corrected LR and AP data, and the FA standard deviation over the corrected LR, RL, AP, and PA data. Results: We found that DR-BUDDI and TOPUP offered the most accurate and robust correction compared to the other four methods using both direct and indirect evaluation metrics. EPIC and HySCO performed well in correcting b0 images but produced poor corrections for diffusion weighted volumes, and also they produced large errors for the four diffusion tensor metrics. We also demonstrate that the indirect metric (the difference between corrected LR and AP data) gives a different ordering of correction quality than the direct metric. Conclusion: We suggest researchers to use DR-BUDDI or TOPUP for susceptibility distortion correction. The two indirect metrics (the difference between corrected LR and AP data, and the FA standard deviation) should be interpreted together as a measure of distortion correction quality. The performance ranking of the various tools inferred from direct and indirect metrics differs slightly. However, across all tools, the results of direct and indirect metrics are highly correlated indicating that the analysis of indirect metrics may provide a good proxy of the performance of a correction tool if assessment using direct metrics is not feasible.


INTRODUCTION
Analysis of diffusion MRI data is confounded by the presence of susceptibility distortions, caused by an off-resonance field induced by differences in magnetic susceptibility at the airtissue interface. There are a number of techniques available for correcting susceptibility distortions. Broadly, these techniques can be divided into three types: registration based (RB) methods, fieldmap based (FB) methods and phase encoding based (PB) methods. The first approach involves registration of the distorted image to a structural image without distortions. The second approach involves estimating a map of the B 0 inhomogeneity, and using this along with information about the diffusion acquisition protocol to correct for the distortions. The third approach is based on estimating the underlying distortions using additional data acquired with a different phase encoding direction. For example, it is common to collect LR (left right) and RL (right left) images, or AP (anterior posterior) and PA (posterior anterior) images. Phase encoding based techniques have been demonstrated to outperform the other two approaches (Esteban et al., 2014;Graham et al., 2017), at the cost of a longer scan time.
The lack of ground truth means that evaluations are typically indirect or qualitative (Jezzard and Balaban, 1995;Wu et al., 2008;Bhushan et al., 2012;Ruthotto et al., 2013;Fritz et al., 2014;Irfanoglu et al., 2015Irfanoglu et al., , 2018Taylor et al., 2016;Hedouin et al., 2017;Wang et al., 2017). Only a few investigations have been carried out with the presence of a ground truth for evaluation of susceptibility distortion correction (Andersson et al., 2003;Esteban et al., 2014;Graham et al., 2017). Hedouin et al. (2017) compared aDC, aBMDC, and TOPUP using phantom data and human data. For the phantom data, both the aBMDC and TOPUP corrected images appear visually similar for correcting  Andersson et al., 2003 the distortion, while aDC gives visually poorer results. aBMDC outperformed aDC and TOPUP by obtaining smaller landmark position errors. For human data, images corrected using aDC contain a mismatch around the lateral ventricles compared with respect to a structural (T1-weighted) image. aBMDC and TOPUP both obtain a corrected image very close to the structural T1-weighted image. aBMDC and TOPUP show a very high similarity between the two corrected images C AP+PA and C LR+RL , outperforming aDC. Irfanoglu et al. (2015) compared EPIC, DR-BUDDI, and TOPUP using human data. DR-BUDDI produced sharper images than EPIC and TOPUP, showing clearly visible tissue interfaces. Areas such as the inferior temporal lobes and the olfactory bulbs were more accurately reconstructed by DR-BUDDI than EPIC and TOPUP. DR-BUDDI resulted in the lowest variability between the two corrected images C AP+PA and C LR+RL , followed by TOPUP and then EPIC. Overall, DR-BUDDI corrected images showed a higher correlation with the undistorted T2-weighted image than did EPIC and TOPUP.
In this work, we undertake a comparison of six phase encoding based methods for susceptibility distortion correction using both simulated diffusion data and real diffusion data, see Table 2 for differences between our study and previous comparisons. A brief summary of the six methods is as follows.
• aDC: The aDC method estimates the displacement field based on the image cumulative distribution function along each line in the PE direction. This method is rather sensitive to noise and it realigns every line independently, so in order to reduce the effect of noise and increase the coherence between the corrected lines, a 3D Gaussian smoothing is applied on the estimated displacement field, which leads to a trade-off between regularity and precision.
• aBMDC: The aBMDC method adopts the symmetric diffeomorphic image registration idea from Avants et al. (2008) to make the transformed LR (or AP) and RL (or PA) images as similar as possible. The transformation field is obtained using a block-matching algorithm (Ourselin et al., 2000). The squared correlation coefficient is used as the similarity measure between blocks.
• DR-BUDDI: The DR-BUDDI method also adopts the symmetric diffeomorphic image registration idea from Avants et al. (2008). The first part of the cost function is designed to maximize the similarity between the structural image and the transformed LR/RL (or AP/PA) images. The second part of the cost function is designed to maximize the similarity between the structural image and the geometric average of the transformed LR and RL images. The transformation is constrained along the PE direction. Two co-dependent transformations [one for mapping LR to RL (or AP to PA), the other for mapping RL to LR (or PA to AP)] are used to account for differences in the inhomogeneity between LR (or AP) and RL (or PA) acquisitions. Unlike other methods, DR-BUDDI estimates the transformation not only for b 0 images but based on a weighted sum over different diffusion weighted images, in order to preserve tract structures.
• EPIC: The EPIC method estimates the displacement field by minimizing the sum of squared differences between the transformed LR (or AP) and RL (or PA) images. Two regularization terms are used to regularize the smoothness and amplitude of the displacement. The undistorted image can then be restored using the estimated displacement field.
• HySCO: The HySCO method adopts the physical distortion model from Chang and Fitzpatrick (1992) and minimizes a distance function to estimate the inhomogeneity, such that the transformed LR (or AP) and RL (or PA) images become as similar as possible. Two regularization terms are used to ensure a differentiable inhomogeneity and positive intensity modulations.
• TOPUP: The TOPUP method models the displacement field as a linear combination of basis warps consisting of a truncated 3D cosine transform. The weights of the basis warps are estimated using an iterative procedure by minimizing the sum of squared differences between the transformed LR (or AP) and RL (or PA) images. Two options are available for obtaining the distortion free image, they are based on least-squares and Jacobian modulation. The resolution of the displacement field is limited by the highest frequency component of the cosine transform.
We used the POSSUM (Drobnjak et al., 2006(Drobnjak et al., , 2010 based diffusion MRI simulator (Graham et al., 2016(Graham et al., , 2017, in order to produce realistic diffusion data with susceptibility distortions typically seen in real data. Simulated data can provide ground truth that enables direct and quantitative evaluation. Our analysis directly measures the ability to correctly recover distortion-free data by comparing the corrected b 0 image with its ground truth. We also investigate the suitability of two commonly used indirect metrics, i.e., the difference of the corrected data from the LRRL and APPA pairs (Ruthotto et al., 2013;Graham et al., 2017), and the FA standard deviation over the corrected LR, RL, AP, and PA data (Wu et al., 2008;Irfanoglu et al., 2015). We hope that this work will enable researchers to make more carefully informed choices when designing their processing pipelines.

Simulated Data
The diffusion data was simulated with 11 volumes of b = 700 s/mm 2 , 12 volumes of b = 2, 000 s/mm 2 and 1 volume of b = 0. The input to the POSSUM diffusion MRI simulator is a collection of three 3D anatomical volumes: gray matter, white matter and cerebro-spinal fluid (CSF). The voxel values in these segmentations reflect the proportion of tissue present in each voxel, in the range [0,1]. The input was generated from the T1-weighted structural image from HCP using the FSL function FAST (Zhang et al., 2001). The representation of diffusion weighting was achieved by a spherical harmonic fit of order n = 8 to the b = 1, 000 s/mm 2 shell of the diffusion data, using the Dipy (Garyfallidis et al., 2014) module reconst.shm. The MR parameters used are listed in Table 3. We used a matrix size of 72 × 86, 55 slices and a voxel size of 2.5 mm isotropic. The TE was 109 ms, the TR was 9.15 s and the flip-angle was 90 • . The PE bandwidth per pixel was 17.1 Hz for LR and RL data, and 14.3 Hz for AP and PA data. The fieldmap was generated from one phase difference volume and two magnitude volumes (one for each echo time) from HCP using the FSL function fsl_prepare_fieldmap (Jenkinson, 2003). To generate a tight brain extration for fsl_prepare_fieldmap, the brain mask created by the FSL function BET (Smith, 2002) was further eroded using a 5 mm box kernel. The generated fieldmap was linearly registered to the T1-weighted structural image using the FSL function FLIRT (Jenkinson and Smith, 2001;Jenkinson et al., 2002). Diffusion data was simulated with four PE directions, i.e., left-right (LR), right-left (RL), anterior-posterior (AP) and posterior-anterior (PA). No other distortions (e.g., eddy-currents and head motion) were included in the simulations. We also simulated a ground truth set, acquired with the same acquisition parameters but no input susceptibility fieldmap. We simulated diffusion data for five subjects (100206, 100307, 100408, 100610, 101006) from the Human Connectome Project (HCP) 1 (Glasser et al., 2013;Van Essen et al., 2013).

Real Data
We used 40 subjects from the developing HCP project (Hughes et al., 2017;Bastiani et al., 2019). It provides diffusion data acquired with four PE directions: AP, PA, LR and RL, enabling evaluation using the indirect metric (i.e., comparing APPA corrected to LRRL corrected). The data was acquired on a 3T Philips Achieva scanner and consists of 4 shells: 20 volumes of b = 0, 64 volumes of b = 400 s/mm 2 , 88 volumes of b = 1, 000 s/mm 2 and 128 volumes of b = 2, 600 s/mm 2 . The data was acquired using TR = 3,800 ms and TE = 90 ms. The matrix size is 128×128, the number of slices is 64 and the acquired voxel size is 1.17 × 1.17 × 1.5 mm.

METHODS
For the simulated data, we used the FSL function BET (Smith, 2002) to create the brain mask from the distortion-free b 0 image. Diffusion tensor fitting and FA calculation were performed using the FSL function dtifit. For simulated data, we evaluated the FIGURE 1 | GT: simulated diffusion data (without distortions) for HCP subject 100206. Fieldmap: the real fieldmap used to simulate the distortions. LR: simulated diffusion data with LR distortion. RL: simulated diffusion data with RL distortion. AP: simulated diffusion data with AP distortion. PA: simulated diffusion data with PA distortion.
FIGURE 2 | Three levels of susceptibility distortion were simulated. 1 field is the original fieldmap. 1/2 field is the original fieldmap divided by 2. 1/4 field is the original fieldmap divided by 4.
Frontiers in Neuroinformatics | www.frontiersin.org ability of each method to recover the correct intensity at each voxel, by computing error maps between the distortion corrected data and ground truth. We also investigated two indirect metrics. One is the difference of the corrected data from the LRRL and APPA pairs (Ruthotto et al., 2013;Graham et al., 2017), and the other is the FA standard deviation over the corrected LR, RL, AP and PA data (Wu et al., 2008;Irfanoglu et al., 2015). Comparing the uncertainty of data from different preprocessing pipelines is a way to determine if one pipeline is better than the other (Sjölund et al., 2018;Gu et al., 2019). For the real data, we used the brain mask provided with the dataset. We corrected for head motion between LR, RL, AP and PA scans by registering all volumes to the first volume using the FSL function FLIRT (Jenkinson and Smith, 2001;Jenkinson et al., 2002). For real data, we used indirect evaluation. We compared the corrected data from the LRRL pair with the result from the APPA pair, since ideally the corrected results from the two pairs should be the same.
For each susceptibility distortion correction tool we used default settings and steps provided by the software's basic help documentation unless otherwise stated. For DR-BUDDI, we used the command DR_BUDDI_withoutGUI because it is faster than DR_BUDDI. By default, the first step in DIFFPREP performs Gibbs ringing correction, denoising, head motion correction and eddy-current distortion correction, prior to the second step susceptibility distortion correction by DR-BUDDI. However, the simulated data in this paper have only susceptibility distortions, corrections by DIFFPREP are therefore not necessary and can even be counterproductive. Additionally, the use of a structural image in DR-BUDDI will introduce deviation from the ground truth due to the registration between diffusion and structural images. Therefore, we ran DR-BUDDI for the simulated data without the corrections from DIFFPREP and without a structural image. For the real data, we ran DR-BUDDI with the default setting together with DIFFPREP. There is no method referred to in any of the EPIC documentation for choosing an alternate phase direction than the y-direction (or the LRRL direction with regards to this study). We manually rotated the LR and RL data 90 degrees in the x − y plane before feeding them to EPIC.        The data was finally rotated 90 • in the other direction after correction. We share our processing scripts on Github 2 , such that other researchers can reproduce and extend our findings (Eklund et al., 2017). Figure 1 shows the simulated diffusion data and the fieldmap for HCP Subject 100206, the simulated distortions look very realistic. We investigated how different levels of distortion affect the correction performance by simulating three levels of susceptibility distortion, as shown in Figure 2. The distortion level was controlled by dividing the fieldmap by a factor 1, 2, or 4.

Direct Metric
In this section, we present the results for the direct metric, including the error maps of b 0 , FA, MD, principal eigenvalues and principal eigenvectors. Figure 3 shows the corrected b 0 images using six different methods, along with error maps obtained by calculating the difference compared to the ground truth images. Correction was carried out for LRRL and APPA pairs, FIGURE 10 | Difference between corrected b 0 volumes from LR and AP for simulated HCP Subject 100206 using the six methods.
FIGURE 11 | MAD and MSD of b 0 from corrected LR and AP for five simulated HCP subjects using the six methods. The error bars represent the standard deviation over subjects. DR-BUDDI provides the smallest difference between the two corrections.
Frontiers in Neuroinformatics | www.frontiersin.org respectively, and we used the corrected LR and AP images as the results for the two pairs. aDC and aBMDC show large errors for edge voxels. DR-BUDDI, EPIC, HySCO, and TOPUP produced very small errors for both LR and AP cases, mainly along edges. Figure 4 shows the error maps using the six methods for the three levels of distortion. Correction was carried out for LRRL and APPA pairs, respectively. In addition to visual inspection, we computed the mean absolute error (MAE) and the mean squared error (MSE) within the brain and their standard deviations (error bars) for five simulated HCP subjects using the six methods, as shown in Figure 5. The results confirmed what we observed in Figure 4 and quantitatively demonstrates the accuracy and robustness of the six methods. The MAE and MSE decreased with decreasing inhomogeneity field strength, aligned with our predictions. Similarly, Figures 6-9 show the error maps of FA, MD, principal eigenvalues and principal eigenvectors after diffusion tensor fitting of the corrected data. The error map of principal eigenvectors was obtained by calculating the angle (in degree) between the principal eigenvector and its ground truth. All error maps are quantitatively summarized in Table 4.

Indirect Metric
In this section, we present the results for the indirect metric, including b 0 difference maps and the FA standard deviation over the corrected LR, RL, AP, and PA data. To investigate the suitability of LRRL-APPA differences as an indirect metric we plot the corrected LR and AP data, and their differences, as shown in Figure 10. Ideally, the corrected LR and the corrected AP would be identical. The whole-brain mean absolute difference (MAD) and mean squared difference (MSD) were computed for every correction method, as shown in Figure 11. The results show larger MAD and MSD for aDC and aBMDC compared to the other four methods DR-BUDDI, EPIC, HySCO, and TOPUP. The results are not completely consistent with the previous results in Figure 5 for the direct metric. These results demonstrate that the indirect metric (difference maps) shows a FIGURE 13 | Mean FA standard deviation over LR, RL, AP, and PA data for five simulated HCP subjects using the six methods. The error bars represent standard deviation of this metric across subjects. DR-BUDDI produces the smallest standard deviation.  different ordering of correction performance compared to the direct metric (b 0 error maps).
To investigate the suitability of FA standard deviation as an indirect metric we plot the FA standard deviation over the corrected LR, RL, AP, and PA data, as shown in Figure 12. Ideally, the corrected LR, RL, AP, and PA would be identical, which would result in zero FA standard deviation. The wholebrain mean of the standard deviation was computed for every correction method, as shown in Figure 13. The results show the largest mean FA standard deviation for EPIC. DR-BUDDI and TOPUP produced very small FA standard deviation over LR, RL, AP, and PA data, and the standard deviation of this metric is also low across subjects. The indirect metric (FA standard deviation) served as a good indication for correction performance compared to the direct metric (FA error maps) as shown in Figure 6. Figure 14 shows the distorted diffusion data from a dHCP subject scanned with four different PE directions. The distortions are more pronounced in regions close to tissue-air interfaces, such as the frontal poles and the temporal lobes near the petrous bone. To evaluate the six methods, we computed the difference between the corrected LR and AP for the dHCP data, as shown in Figure 15. Higher difference values are visible in regions most affected by magnetic susceptibility variations such as the boundary regions of the brain. Figure 16 reports the whole-brain mean of the difference, for the 40 processed subjects. DR-BUDDI, EPIC, HySCO, and TOPUP performed almost equally well, which resembled the results for simulated data given in Figure 11. We plot the FA standard deviation over the corrected LR, RL, AP, and PA data for the 40 processed subjects in Figure 17. It confirmed what we observed in Figure 13 for results of simulated data.

Discussion
In this paper, we used both simulated and real data to evaluate six phase encoding based methods for correcting susceptibility distortions. This work is important given that phase encoding based methods have been demonstrated to outperform the other two classes of approaches, and are very frequently used in diffusion data analysis pipelines. It is thus essential to carefully evaluate phase encoding based correction techniques and their limitations. By this work we aim to answer the following two questions. Which method of the six provides the best distortion correction? Are the indirect metrics suitable for measuring the distortion correction performance?
The error map directly measures the ability to correctly recover distortion-free data for different methods. Based on our experiments, we found that DR-BUDDI, EPIC, HySCO, and TOPUP were generally superior to the other two methods in achieving better performance for the error map of b 0 , see Figure 5. We then applied the transformation field obtained from b 0 images to diffusion weighted volumes and calculated FA, MD, eigenvalues and eigenvectors after diffusion tensor fitting. The FIGURE 15 | Difference between corrected b 0 volumes from LR and AP for dHCP Subject CC00069XX12 using the six methods.
FIGURE 16 | MAD and MSD of b 0 from corrected LR and AP for 40 dHCP subjects using the six methods. The error bars represent the standard deviation over subjects. In all cases DR-BUDDI produces the smallest difference. error maps of these four metrics demonstrated that DR-BUDDI and TOPUP provided the most accurate and robust distortion corrections among the six methods. EPIC and HySCO performed well in correcting b 0 images but produced poor corrections for diffusion weighted volumes, and these two methods produced rather large errors in terms of the four diffusion tensor metrics as shown in Figures 6-9. Therefore, the error map of b 0 should be interpreted together with the error maps of diffusion metrics for a better evaluation of the correction quality. It is notable that EPIC showed large errors and very large variance of performance over subjects for the LRRL case when the inhomogeneity field strength was 1. The reason might be that EPIC was originally designed for the APPA case and there is no method referred to in any of the EPIC documentation for choosing an alternate phase direction than the y-direction (or the LRRL direction with regards to this study). With the phase encode direction chosen in the APPA dimension, susceptibility distortion is manifested as stretching or compression of the image in the APPA direction, which can be more desirable than asymmetric distortion in the LRRL direction (Glover et al., 2012). This might explain that EPIC was designed only for the APPA case. It is indeed often the case that the phase encoding direction is chosen to be the y-direction, however there are many images where this is not the case (Kemper et al., 2015;Hughes et al., 2017). TOPUP was found to work only for data volumes with even size on x, y, and z directions since the default configuration file (b02b0.cnf ) performs a factor of 2 subsampling. There are two solutions suggested to eliminate this problem; either cropping or adding dummy data to the 3D volume.
Indirect metrics are often used to evaluate distortion correction for real data due to the absence of ground truth. We investigated the ability of two indirect metrics to measure the correction quality. We validated two of the most promising indirect metrics for correction quality, i.e., the difference between corrected LR and AP data, and the FA standard deviation over the corrected LR, RL, AP, and PA data. The first indirect metric, as shown in Figures 11, 16, roughly confirmed what we observed for the error map of b 0 in Figure 5, although the ordering of correction quality is slightly different. Similarly, the FA standard deviation over the corrected LR, RL, AP, and PA data, as shown in Figure 13, confirmed what we observed for the error map of FA in Figure 6.

Limitations
Before comparing the quality of the corrections provided by each of these tools, it is important to note that each tool has its customizable parameters. The default parameter settings were used in this work, as it was reasoned that this would be representative of the way the tools were most often used. It was also reasoned that changing default parameters or influencing the inputs prior to correction would lead to a less fair comparison.
It should be noted that we used different DR-BUDDI settings for simulated data and real data, because the default pipeline is designed for real data with different types of distortions, but our simulated data only contain susceptibility distortions. It is therefore possible that the corrections in this work may not represent the best corrections attainable by use of these tools.

AUTHOR CONTRIBUTIONS
XG and AE contributed to the conception and design of the study. XG performed the diffusion MRI data analysis, statistical analysis, and wrote the draft of the manuscript. AE provided comments to the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.