PyHySCO: GPU-enabled susceptibility artifact distortion correction in seconds

Over the past decade, reversed gradient polarity (RGP) methods have become a popular approach for correcting susceptibility artifacts in echo-planar imaging (EPI). Although several post-processing tools for RGP are available, their implementations do not fully leverage recent hardware, algorithmic, and computational advances, leading to correction times of several minutes per image volume. To enable 3D RGP correction in seconds, we introduce PyTorch Hyperelastic Susceptibility Correction (PyHySCO), a user-friendly EPI distortion correction tool implemented in PyTorch that enables multi-threading and efficient use of graphics processing units (GPUs). PyHySCO uses a time-tested physical distortion model and mathematical formulation and is, therefore, reliable without training. An algorithmic improvement in PyHySCO is its use of the one-dimensional distortion correction method by Chang and Fitzpatrick to initialize the non-linear optimization. PyHySCO is published under the GNU public license and can be used from the command line or its Python interface. Our extensive numerical validation using 3T and 7T data from the Human Connectome Project suggests that PyHySCO can achieve accuracy comparable to that of leading RGP tools at a fraction of the cost. We also validate the new initialization scheme, compare different optimization algorithms, and test the algorithm on different hardware and arithmetic precisions.


Introduction
Reversed Gradient Polarity (RGP) methods are commonly used to correct susceptibility artifacts in Echo-Planar Imaging (EPI) [Stehling et al., 1991].RGP methods acquire a pair of images with opposite phase encoding directions, which leads to a minimal increase in scan time due to the speed of EPI.In a post-processing step, RGP approaches use the fact that the distortion in both images has an equal magnitude but acts in opposite directions to estimate the field map (see Figure 1) [Chang andFitzpatrick, 1992, Bowtell et al., 1994].The field map is then used to estimate a distortion-free image.
Compared to other correction approaches such as field map acquisition, point-spread function map acquisition, and anatomical registration, RGP methods generally achieve comparable or superior accuracy while being more robust to noise and motion; see, e.g., [Tax et al., 2022, Graham et al., 2017, Esteban et al., 2014, Wu et al., 2008].These advantages make RGP correction a popular choice.For example, the widely-used MRI database from the Human Connectome Project (HCP) [Van Essen et al., 2012] used the RGP correction tool TOPUP [Andersson et al., 2003] in the preprocessing of released diffusion MRI from EPI scans.
The original RGP distortion correction approaches in [Chang andFitzpatrick, 1992, Bowtell et al., 1994] are one-dimensional, treating each image column separately in the phase encoding direction.This leads to a non-smooth field map estimate and corrections.TOPUP addresses this non-smoothness with a 3D spline-based approach and the introduction of regularization [Andersson et al., 2003].TOPUP has limited support for hyperthreading and is often a time-consuming step of MRI processing pipelines [Cai et al., 2021].Running TOPUP on a standard CPU took over 60 minutes on average per HCP subject in our experiments.
Although less widely used than TOPUP, other iterative methods have proposed implementations of RGP correction employing various optimization schemes, discretizations, and regularization terms to speed up the correction.EPIC [Holland et al., 2010] introduces correction using a nonlinear image registration framework.The tool was developed specifically for Anterior-Posterior distortions and can be less effective for Left-Right distortions [Gu and Eklund, 2019].DR-BUDDI [Irfanoglu et al., 2015] and TISAC [Duong et al., 2020a] regularize the optimization using a T2-weighted or T1-weighted image, respectively.Including undistorted anatomical information can improve the quality of distortion correction [Gu and Eklund, 2019], but complicates the choice of an effective distance measure and, depending on the protocol, may require additional scan time.HySCO introduces hyper-elastic registration regularization and a novel separable discretization [Ruthotto et al., 2012, Ruthotto et al., 2013, Macdonald and Ruthotto, 2017].HySCO can accurately correct real and simulated data varying in phase encoding direction, anatomy, and field of view [Gu and Eklund, 2019, Snoussi et al., 2021, Tax et al., 2022].In our experiments, on average, HySCO runs on the CPU for one to two minutes per HCP subject.While HySCO is a Statistical Parametric Mapping (SPM) [Penny et al., 2007] plugin and has been integrated into a comprehensive SPM-based DTI processing pipeline [Dávid et al., 2024], its dependency on a MATLAB license may limit its wider use.
Motivated by the long processing times of the above RGP tools, several deep learning approaches for susceptibility artifact correction have been proposed recently; see, e.g., [Hu et al., 2020, Duong et al., 2020b, Duong et al., 2021, Zahneisen et al., 2020, Alkilani et al., 2023].A recurrent theme is to train a correction operator in an offline stage in a supervised way using training data, which enables fast evaluations in the online step.For example, training S-Net on 150 volumes took over 5 days, while correcting an image pair on a  Fitzpatrick, 1992] is applied to obtain a corrected image I.
CPU took an average of 2.8 seconds (0.96 seconds on a GPU) [Duong et al., 2020b].However, the dramatic reduction of correction time comes at the cost of losing the robustness and generalizability that existing RGP approaches obtain from the physical distortion model.For example, while RGP approaches can handle images from different scanners, anatomies, resolutions, and other acquisition parameters, deep learning models perform poorly when applied outside the training distribution, [Chen et al., 2022].Furthermore, deep learning models are highly sensitive to noise and adversarial attacks in other contexts [Antun et al., 2020].
The PyHySCO toolbox aims to achieve the accuracy, robustness, and generalizability of state-of-the-art RGP approaches at computational costs similar to evaluating a pre-trained deep learning model.PyHySCO offers EPI distortion correction through a GPU-enabled and command-line-accessible Python tool powered by PyTorch [Paszke et al., 2019].The mathematical formulation is based on HySCO augmented by the separable discretization [Macdonald and Ruthotto, 2017] that increases parallelism and an optimal transport-based initialization step that alleviates the need for multilevel optimization.We demonstrate the use of PyHySCO using its Python interface and command-line tool, which is compatible with existing MRI postprocessing pipelines.
The remainder of our paper is organized as follows.In Section 2, we review the mathematical model and its discretization under the hood of PyHySCO and introduce our novel parallelized initialization using optimal transport, fast solvers exploiting the separable structure, and GPU-enabled PyTorch implementation.In Section 3, we extensively validate PyHySCO on real and simulated EPI data.We show the speed and accuracy of the proposed parallelized optimal transport initialization scheme and the speed and accuracy of the complete correction pipeline across optimizers, GPUs, and levels of numerical precision.In Section 4, we discuss the benefits and implications of using PyHySCO for EPI distortion correction.In Section 5, we provide a conclusion.

Methods
This section describes the algorithmic and coding structure of PyHySCO.Section 2.1 introduces our notation and reviews the mathematical formulation of the RGP correction problem.Section 2.2 introduces a novel optimal transport-based field map initialization.Section 2.3 describes the optimization algorithms available in PyHySCO.Section 2.4 explains the structure of the code and some key implementation details.Section 2.5 demonstrates the basic usage of PyHySCO and how to integrate it into existing processing pipelines.

Mathematical Formulation
The field map estimation and distortion correction are based on the physical forward model defined in [Chang and Fitzpatrick, 1992].Let v ∈ R 3 be the phase encoding direction for the distorted observation I : Ω → R, and let Ω ⊂ R 3 be the image domain of interest.The mass-preserving transformation operator that, given the field map b : Ω → R, corrects the distortions of an image I acquired with phase-encoding direction v reads Here, ∂ v b is the directional derivative of b in the direction of v.The first term of the operator corrects the geometric deformation in the direction of v, and the second is an intensity modulation term, which should always be positive.
Similar to [Ruthotto et al., 2013], PyHySCO solves the inverse problem of estimating the field map b based on two observations, I +v and I −v , acquired with phase-encoding directions ±v, respectively.To this end, we estimate the field map b by minimizing the distance of the corrected images The distance term is additionally regularized to enforce smoothness and the intensity modulation constraint.The smoothness regularization term, penalizes large values of the gradient of b to ensure smoothness in all directions.
The intensity modulation constraint of the physical model requires that −1 < ∂ v b(x) < 1 for almost all x ∈ Ω.This is enforced by the barrier term Altogether, this gives the optimization problem where the importance of the regularization terms is weighted with non-negative scalars α and β.Higher values of α promote a smoother field map, while lower values of α promote reduced distance between corrected images at the expense of smoothness in the field map.Any positive value for β ensures the intensity modulation constraint is satisfied, but lower values can lead to more ill-conditioned problems.For this paper, we fix α = 300 and β = 1e − 4. PyHySCO follows the discretize-then-optimize paradigm commonly used in image registration; see, e.g., [Modersitzki, 2009].PyHySCO discretizes the variational problem (4) as in [Macdonald and Ruthotto, 2017] to obtain a finite-dimensional optimization problem almost entirely separable in the phase encoding direction.Specifically, coupling is only introduced in the smoothness regularization term when calculating the gradient in the frequency encoding and slice selection directions.
Our convention is to permute the dimensions of the input image such that the phase encoding direction is aligned with the third unit vector e 3 = [0, 0, 1] T .The field map is discretized on an e 3 -staggered grid; that is, we discretize its values in the cell centers along the first two dimensions and on the nodes in the third dimension.The integrals in (4) are approximated by a midpoint quadrature rule.The input images are modeled by a one-dimensional piecewise linear interpolation function in the phase encoding direction.The geometric transformation is estimated in the cell centers with an averaging operator, and the intensity modulation is estimated in the cell centers with a finite difference operator.
The discretized smoothness regularization term is computed for the discretized field map b via where h 1 , h 2 , h 3 are the voxel sizes and H is a standard five-point discretization of the negative Laplacian and thus is a positive semi-definite operator.The discretized intensity modulation constraint term applies ϕ as defined in (3) element-wise to the result of a finite difference operator applied to the discretized field map.This gives the discretized optimization problem to solve as This problem is challenging to solve because it is high-dimensional and non-convex, but we can exploit the structure and separability to efficiently solve the problem using parallelization.The implementation of this optimization problem in a parallelizable way, as described in Section 2.4, includes choices of image interpolation, linear operators for averaging and finite difference, and regularization terms S and P .

Parallelized Initialization using Optimal Transport
Due to the non-convexity of the optimization problem (6), an effective initialization strategy for the field map is critical.To this end, PyHySCO introduces a novel initialization scheme based on techniques from optimal transport (OT) [Peyré et al., 2017].The key idea is to compute the 'halfway' point of the oppositely distorted images in the Wasserstein space (as opposed to Euclidean space, which would simply average the images).To render this problem feasible, we treat each image column separately, use the closed-form solutions of 1D OT problems, and then apply a smoothing filter.We calculate these transformations as optimal transport maps [Peyré et al., 2017].More specifically, because the distortions only occur in the phase encoding direction, these transformations are a set of onedimensional maps calculated in parallel across the distortion dimension.One-dimensional optimal transport has a closed-form solution arising from considering the one-dimensional signal as a positive measure and constructing a cumulative distribution function [Peyré et al., 2017].
We describe the computation of the one-dimensional optimal transport maps in the distortion correction setting.In practice, the computation is parallelized in the distortion dimension to compute the entire initial field map simultaneously.
Let i +v ∈ R m be the image data from an entry in the phase encoding dimension of I +v , and let i −v ∈ R m be the image data from the corresponding entry in the phase encoding dimension of I −v .Consider i half the sequence of image intensity values from the corresponding entry of the undistorted image I.We numerically ensure i +v and i −v can be considered positive measures by applying a small shift to the image values, which does not change the relative distance between elements.
We initialize the field map using the optimal transport maps T + from i +v to i half and T − from i −v to i half .These maps can be directly computed using the closed-form one-dimensional optimal transport formula, which depends on a cumulative distribution function and its pseudoinverse [Peyré et al., 2017].where i(j) returns the pixel intensity value at index j of i.The pseudoinverse In practice, C −1 i is computed using a linear spline interpolation.Returning to the measures arising from the input images, the closed-form solution for one-dimensional optimal transport gives the optimal transport map from i +v to i half as and the optimal transport map from i −v to i half as 2 visualizes the computation of the one-dimensional transport maps, and the parallelized computation and resulting field maps are visualized in Figure 3.We thus compute the initial guess for the field map as the average of the maps T + and −T − , computed in parallel.To introduce smoothness in the field map in the frequency encoding and slice selection dimensions, we apply a smoothing filter to the initial field map before optimization.

Optimization Algorithms
Since the optimal choice of optimization algorithms for approximately solving (6) may depend on various factors, including image sizes, computational hardware, and severity of distortions, PyHySCO offers three options.Section 2.3.1 describes a Gauss-Newton scheme with Jacobi-Preconditioned Conjugate Gradient Figure 3: The maps T + and T − are calculated using the closed-form one-dimensional optimal transport solution, parallelized in the distortion dimension [Peyré et al., 2017].Note the inverted coloring between T + and T − as the map T − corrects a distortion in the opposite direction as T + .
(GN-PCG) method as an inner solver, which is similar to [Ruthotto et al., 2013] and is the default option.
An option that exploits the parallelism of the discretization more effectively is the Alternating Direction Method of Multipliers (ADMM) in Section 2.3.2, which is based on [Macdonald and Ruthotto, 2017].For comparison, we also provide an interface to an LBFGS optimizer; see Section 2.3.3.
The key idea is to linearize the residual in the distance term (2) and the nonlinear penalizer (3) about the k-th iterate b k and approximately solve the resulting quadratic problem with a few iterations of the PCG method.
More precisely, let ∇J be the gradient and H J be a positive definite approximation of the Hessian of the optimization problem (6) about b k .Gauss Newton iteratively updates the current field map estimate via where the step size γ k is determined with a line search method such as Armijo [Nocedal and Wright, 1999, Ch. 3 p. 33-36] and the search direction q k approximately satisfies To obtain q k , we apply up to 10 iterations of the preconditioned conjugate gradient (PCG) method and stop early if the relative residual is less than 0.1; see the original work [Hestenes and Stiefel, 1952] or the textbook [Saad, 2003] for more details on PCG.The performance of PCG crucially depends on the clustering of the eigenvalues, which a suitable preconditioner can often improve.As a computationally inexpensive and often effective option, we implement a Jacobi preconditioner, which approximates the inverse of H J by the inverse of its diagonal entries.Instead of constructing the matrix H J , which is computationally expensive, we provide efficient algorithms to compute matrix-vector products and extract its diagonal.While the diagonal preconditioner works well in our examples, we note that a more accurate (yet also more expensive) block-diagonal preconditioner has been proposed in [Macdonald and Ruthotto, 2017].

Alternating Direction Method of Multipliers (ADMM)
We additionally modify the ADMM [Boyd et al., 2011] algorithm in [Macdonald and Ruthotto, 2017] and implement it in PyHySCO.To take advantage of the separability of the objective function, the idea is to split the optimization problem into two subproblems.In contrast to [Macdonald and Ruthotto, 2017], which uses a hard constraint to ensure positivity of the intensity modulation and employs Sequential Quadratic Programming, we implement this as a soft constraint with the barrier term (3).
As in [Macdonald and Ruthotto, 2017], we split the objective in (6) into where S 3 is the part of the smoothness regularization term S corresponding to the phase encoding direction, and S 1 and S 2 are the remaining terms corresponding to the other directions.This gives rise to the following optimization problem, equivalent to (6): With the corresponding augmented Lagrangian where y is the Langrange multiplier for the equality constraint b = z and ρ is a scalar augmentation parameter, and using scaled Lagrange multiplier u = y ρh 3 , each iteration has the updates The b update computed in (9) involves a separable optimization problem that can be solved independently for each image column along the phase-encoding direction.In PyHySCO we use a modified version of the GN-PCG scheme described above.The only change is the computation of the search direction, which can now be parallelized across the different image columns.To exploit this structure, we implement a PCG method that solves the system for each image column in parallel.In addition to more parallelism, we observe an increase in efficiency since the scheme uses different step sizes and stopping criteria for each image column.
The z update is computed by solving the quadratic problem (10) directly.This is enabled by the structure of the associated linear system, which is block-diagonal, and each block is given by a 2D negative Laplacian (from the regularizers) shifted by an identity (from the proximal term).Assuming periodic boundary conditions on the images, the blocks in the approximation itself have an exploitable structure (called Block Circulant -Circulant Block in [Hansen et al., 2006]) and, therefore, can be inverted efficiently with the Fast Fourier Transform (FFT).
The augmentation parameter ρ is updated adaptively as described in [Boyd et al., 2011] to keep the relative primal and dual residuals close.

LBFGS
As a comparison, we provide an implementation of LBFGS [Liu and Nocedal, 1989], although optimization with LBFGS does not exploit any of the structure or separability of the optimization problem.LBFGS is a quasi-Newton method that uses an estimate of the objective function's Hessian based on a limited number of previous iterations in solving for the search direction [Liu and Nocedal, 1989].In our implementation, we provide an explicitly calculated derivative to an LBFGS solver1 .In computing the objective function, we precompute parts of the derivative which allows for faster optimization than relying on automatic differentiation.

Coding Structure of PyHySCO
We implemented PyHySCO in PyTorch [Paszke et al., 2019] following the overall code structure visualized in the diagrams in Figures 4a and 4b for the objective function and optimization, respectively.The main classes of PyHySCO are the loss function, implemented in EPIMRIDistortionCorrection, and the optimization, defined in EPIOptimize.The other classes and methods, described in detail below, implement the components of the loss function evaluation and optimization schemes.

Data Storage and Image Model
The input pair of images with opposite phase encoding directions are loaded and permuted such that the distortion dimension is the last, as this is where PyTorch expects the batch dimension for parallelizing operations.Information on the input images is stored in an object of type DataObject.This class stores information on the image size, domain, voxel size, how to permute the data back to the input order, and the ImageModel for each input image.The ImageModel abstract class defines the structure and required methods for an image model, including storing the original data and providing a method eval that returns the data interpolated on the given points.We provide the default implementation Interp1D, a piecewise linear one-dimensional interpolation parallelized in the last dimension.The DataObject for a given input pair is then stored in the EPIMRIDistortionCorrection object.

Correction Model
The mass-preserving correction model ( 1) is implemented in the method mp transform, a class method of EPIMRIDistortionCorrection.The method takes as input an ImageModel and a field map.The geometric deformation is computed by using an averaging LinearOperator to compute the field map values in the cell centers and adding this to a cell-centered grid to obtain the deformed grid defined by this field map.Using the ImageModel, the image is interpolated on this deformed grid.The intensity modulation term is computed using a finite difference LinearOperator.The two terms are multiplied together elementwise before returning the corrected image.The default implementation of the LinearOperator objects for averaging and finite difference are given as one-dimensional convolutions, parallelized in the last dimension.

Regularization Terms
The intensity regularization term is computed within the EPIMRIDistortionCorrection class in the method phi EPI which computes the result of applying ϕ as defined in (3) element-wise to the result of applying the finite difference operator to the field map, as computed in the correction model.This function acts as a barrier term, ensuring that the derivative of the field map in the distortion dimension is in the range (-1, 1).
The smoothness regularization term is implemented in a QuadRegularizer object, which defines the evaluation of a quadratic regularization term of the form of (5) using a positive semi-definite LinearOperator as H.By default, H is a discretized negative Laplacian applied via a three-dimensional convolution.
In the ADMM optimizer, the regularizer structure differs to account for the splitting in (8).The objective function for the b update in ( 9) is computed in EPIMRIDistortionCorrection where the computation of S 3 is a one-dimensional Laplacian in the distortion dimension applied via a one-dimensional convolution.The proximal term is computed through a TikRegularizer object, a Tikhonov regularizer structure.The objective function for the z update in ( 10) is a QuadRegularizer object where the LinearOperator H is a two-dimensional Laplacian corresponding to S 2 and S 3 .This operator is implemented in FFT3D, which defines an operator applying a convolution kernel diagonalized in Fourier space [Cooley et al., 1969].This implementation allows for easily inverting the kernel in solving for z.

Hessian and Preconditioning
For the Gauss-Newton and ADMM optimizers, an approximate Hessian and preconditioner are additionally computed.Parts of the Hessian are computed in EPIMRIDistortionCorrection during objective function evaluation, and the Hessian can be applied through a matrix-vector product.Similarly, a Preconditioner can be computed during objective function evaluation and is accessible through a returned function applying the preconditioner to its input.By default, we provide a Jacobi preconditioner in the class JacobiCG.

Initialization
The EPIMRIDistortionCorrection class has a method initialize, returning an initial guess for the field map using some InitializationMethod.We provide an implementation of the proposed parallelized optimal transport-based initialization in InitializeOT.The implementation computes the one-dimensional transport maps in parallel using a linear spline interpolation.In practice, the parallelized initialization gives a highly non-smooth initial field map, so the method optionally applies a blurring operator using a 3-by-3-by-3 Gaussian kernel with a standard deviation of 1.0 to promote a smoother optimized field map.Applying the blur to the field map is implemented using the fast FFT convolution operator FFT3D.

Optimization
The minimization of the objective function defined in a EPIMRIDistortionCorrection object happens in a subclass of EPIOptimize, which takes the objective function object as input.During optimization, the OptimizationLogger class is used to track iteration history, saving it to a log file and optionally printing this information to standard output.PyHySCO includes implementations of the LBFGS, Gauss-Newton, and ADMM solvers described previously.Each of the classes LBFGS, GaussNewton, and ADMM provide a run correction method which minimizes the objective function using the indicated optimization scheme.The LBFGS implementation uses the explicitly computed derivative from EPIMRIDistortionCorrection.For LBFGS, we use the norm of the gradient reaching a given tolerance as stopping criteria, or the change in loss function or field map between iterations falling below a given tolerance.The GaussNewton implementation uses a conjugate gradient solver implemented in the class PCG.Our Gauss Newton implementation uses the same stopping criteria as LBFGS.The ADMM implementation solves the b update in (9) using GaussNewton with a parallelized conjugate gradient solver in BlockPCG.The z update in (10) is solved directly through the inverse method inv of the operator used to define the QuadRegularizer for this term, efficiently implemented using FFTs in FFT3D.As stopping criteria, the ADMM iterations will terminate if the change in all of b, z, and u from the previous iteration falls below a given tolerance.

Image Correction
The optimal field map, stored as Bc in the EPIOptimize object after run correction is completed, can be used to produce a corrected image or pair of images.The apply correction method of EPIOptimize implements both a Jacobian modulation correction and a least squares correction.The Jacobian modulation correction is based on the model of [Chang and Fitzpatrick, 1992] as implemented in the mp transform method of EPIMRIDistortionCorrection.This correction method computes and saves two corrected images, one for each input image.The field map can also be used in a least squares correction similar to the correction in [Andersson et al., 2003], implemented in LeastSquaresCorrection.In this correction, the estimated field map determines a pushforward matrix that transforms the true image to the distorted image given as input.This gives rise to a least squares problem for the true image given the input images and push forward matrix.

PyHySCO Usage and Workflow
The workflow of PyHySCO is illustrated in Figure 5a alongside examples of using PyHySCO in a Python script (Figure 5b) and through the command line (Figure 5c).Running PyHySCO from a user-defined Python script allows more specific control of the inputs and outputs from PyHySCO methods.The command line interface allows the user to pass configuration options directly from the command-line, which enables our EPI distortion correction tool to be easily used as a part of existing command-line based MRI post-processing pipelines such as the FSL toolbox [Smith et al., 2004].Executing PyHySCO requires the user to provide, at a minimum, the file paths for the input pair of images with opposite phase encoding directions and which dimension (1, 2, or 3) is aligned with the phase encoding direction.The modularity of PyHySCO additionally allows for configuring options such as the scalar hyperparameters in (6), implementation of operators, regularizers, and interpolation, optimizer and associated optimization parameters, and image correction method.
Regardless of execution through a script or the command line, PyHySCO stores the input images in a DataObject object, the loss function in a EPIMRIDistortionCorrection object, and the optimizer in an object of a subclass of EPIOptimize.The field map is initialized from the method initialize in EPIMRIDistortionCorrection, and the field map is optimized by calling the method run correction in the optimizer object.Finally, the method apply correction in EPIOptimize applies the field map to correct the input images and saves the result to one or more NIFTI file(s).

Results
We demonstrate PyHySCO's effectiveness through extensive experiments using real and simulated data from the Human Connectome Project [Van Essen et al., 2012] and validate the novel initialization scheme and implementation of optimization algorithms.Section 3.1 describes the datasets and Section 3.2 contains our ethics statement.Section 3.3 introduces our evaluation metrics.The experiments in 3.5 compare the performance of the three optimization algorithms implemented in PyHySCO on CPU and GPU hardware.In Section 3.6, we empirically show that PyHySCO can be effective in single precision, which accelerates computation on modern GPU hardware.The results in Section 3.7, suggest that PyHySCO can achieve correction quality at a considerably shorter runtime compared to HySCO and TOPUP [Ruthotto et al., 2013, Andersson et al., 2003].

Validation Datasets
The data used in the following experiments is from the Human Connectome Project [Van Essen et al., 2012].We validate our methods and tool on 3T and 7T diffusion-weighted imaging data from the HCP 1200 Subjects Release, with 20 subjects randomly chosen for each field strength.Table 1 provides details of the datasets.
We also evaluate our methods on simulated data.This data only contains susceptibility artifact distortions, so it shows how our tool performs without the influence of other factors, e.g., patient movement between scans.To simulate the distortions, we use a pair of magnitude and phase images for a subject in HCP and generate the field map using FSL's FLIRT and PRELUDE tools [Smith et al., 2004].Considering the physical model of [Chang and Fitzpatrick, 1992], the field map b can be used to define the push-forward matrices that give how the intensity value at x is pushed forward to x + b(x) in the distortion direction +v as well as the opposite direction −v.Applying the push-forward matrices to a T2-weighted image for the subject, we generate a pair of distorted images.For the simulated data, we then have a reference value for the field map and an undistorted, true image.

Ethics Statement
No new data were collected specifically for this paper.The Human Connectome Project, through the WU-Minn HCP Consortium, obtained written informed consent from all participants in the data collection study [Van Essen et al., 2012].The middle column uses the same scheme with an additional Gaussian blur to promote smoothness.The right column uses the coarse-to-fine multilevel initialization scheme from HySCO with five levels, and the final field map is optimized at the original image resolution.The multilevel initialized field map is smooth by construction and further optimized to improve the relative image distance at the full resolution.The optimal transport initialization accurately corrects the distortions but is not smooth in the non-distortion dimensions unless blurred with a Gaussian.After the fine-level optimization all field maps are visually similar.

Metrics and Comparison Methods
The quality of correction results is measured using the relative improvement of the distance between a pair of corrected images.Particularly, we calculate the sum-of-squares distance (SSD) of the corrected image pair relative to the SSD of the input pair.This metric is a useful surrogate for the correctness of the field map in the absence of a ground truth [Graham et al., 2017].Additionally, we take the value of the smoothness regularization term S(b) as a measure of how smooth the resulting field map is, with lower values being better.
We report the runtime in seconds of PyHySCO.The runtime is measured as the wall clock time using the Linux time command when calling the correction method from the command line.This time, therefore, includes the time taken to load and save the image data.In some cases, we also report the optimization time only, without loading and saving data, as measured by Python's time module.
We compare the runtime, relative improvement, and resulting images after correction using PyHySCO against those given by TOPUP [Andersson et al., 2003] as implemented in FSL [Smith et al., 2004] using the default configuration2 , and HySCO [Ruthotto et al., 2013] as implemented in the ACID toolbox for SPM using the default parameters. is also based on the optimization problem (6), while uses a slightly different objective function.This makes it to compute smoothness and loss function values for TOPUP.

Validity of Optimal Transport Initialization
We compare the results of PyHySCO using our transport initialization to those of the multi-level initialization used HySCO [Ruthotto et al., 2013] both at initialization and after optimization with Gauss-Newton.The multi-level optimization of HySCO solves the optimization problem on a coarse grid and uses the result as the initialization of optimization on a finer continuing until reaching image resolution; this follows the guidelines of [Modersitzki, 2009, 9.4].In our experiments, we use five levels in the initialization.The multi-level initialization gives a field map that is smooth by construction and improves the reduction as the grid becomes more fine.The field map from the PyHySCO optimal transport initialization lowers the relative error between the images, a relative improvement over 96% on real data and 94% on simulated data.the parallelized one-dimensional computations lead to a lack of smoothness in the resulting field map.The smoothness can be improved by applying a Gaussian blur to the field map from the optimal transport initialization.This field map is smoother after initialization and gives a smoother field map after optimization.These results are comparable in relative error and smoothness to the field map optimized from the multilevel initialization of HySCO.Our one-dimensional parallelized optimal transport, even with the additional Gaussian blur, is much faster to compute than the multilevel initial field map given the ability to parallelize computations.PyHySCO initialization on a GPU with the additional blur takes less than 1 second on real data and about 3 seconds on simulated data.In comparison, the multi-level initialization on a CPU takes 30 to 40 seconds on real data and over 2 minutes on simulated data.The mean and standard deviation relative improvement, smoothness value, loss function value, and runtime are reported in Table 2 across all datasets.Examples of these field maps before and after optimization are shown in Figure 6.

Comparison of PyHySCO Optimizers on GPU and CPU
We compare the results of PyHySCO using GN-PCG, ADMM, and LBFGS on both GPU and CPU architectures.Table 3 shows the runtimes and correction quality of each optimizer on CPU and GPU.All optimizers achieve a similar correction quality with respect to relative improvement of image distance, loss value, and smoothness regularizer value, but GN-PCG has faster runtime on both CPU and GPU.On real Table 2: Validation of the optimal transport initialization.We compare the runtime, relative improvement, smoothness value, and loss function value at initialization and after optimization with Gauss Newton for the proposed parallelized optimal transport initializaztion, the proposed initialization with an additional Gaussian blur, and the multilevel initialization used in HySCO [Ruthotto et al., 2013].For each metric we report the mean and standard deviation in the 3T, 7T, and simulated datasets.The multilevel initialization is timed on CPU in Matlab, and the optimal transport initializations and all optimizations are timed on GPU in Python.The optimal transport based initializations provide a comparable quality while decreasing runtime compared to the multilevel initialization, and the optimal transport with Gaussian blur promotes a more smooth field map.
data, GN-PCG took 10-13 seconds on average on GPU and 27-31 seconds on average on CPU, while ADMM took 11-15 seconds on GPU and 98-158 seconds on CPU, and LBFGS took 23-36 seconds on GPU and 104-141 seconds on CPU.Table 4 shows optimization metrics, including the number of iterations, stopping criteria, number of function evaluations, number of Hessian evaluations, and number of inner iterations if applicable.Consistent with its faster runtime, optimization with GN-PCG achieves a similar loss value with less computation as measured by function and Hessian evaluations.Figures 7, 8, and 9 show the field map and corrected images for each optimizer for one example subject from each dataset.The field maps and corrected images are visually similar across optimizers.

Single Precision vs Double Precision on GPU and CPU
We show the validity of PyHySCO using optimal transport initialization and GN-PCG in both double precision (64 bit) and single precision (32 bit) arithmetic on three different GPU architectures and a CPU architecture.Since GPU architectures are optimized for the speed of lower precision calculations, we see a significant speedup when using single precision instead of double precision.Calculations in single precision, however, have the risk of lower accuracy or propagating errors due to using fewer bits to approximate floating point values.Empirically, we see that the quality of our results is not significantly impacted by using singleprecision arithmetic.We also see consistent results across different GPU architectures: a Quadro RTX 8000, Titan RTX, and RTX A6000.Because PyHySCO is optimized to parallelize computations on GPU, the runtimes are faster on the GPUs compared to the Intel Xeon E5-4627 CPU.

Comparison of PyHySCO with HySCO and TOPUP
Table 6 reports the runtime and correction quality for PyHySCO using GN-PCG, HySCO, and TOPUP.
On real 3T and 7T data, PyHySCO achieves lower loss and higher relative improvement between corrected images than HySCO, and higher relative improvement than TOPUP.The runtime on CPU for real data is 1-2 minutes for HySCO and over 1 hour for TOPUP, while PyHySCO on GPU has runtimes of 10-13 seconds.
For the simulated dataset, PyHySCO requires an average of 1 minute on GPU, HySCO an average of 12. Table 3: The speed and quality of optimization in PyHySCO on GPU and CPU with LBFGS, Gauss Newton, and ADMM.We report for each dataset and optimizer the mean and standard deviation total runtime (including loading and saving data), optimization time, improvement in distance between corrected images relative to input image, loss value, and smoothness regularizer value.Gauss Newton achieves a similar correction quality in less time than LBFGS or ADMM on both CPU and GPU.
minutes on CPU, and TOPUP an average of 8.5 hours on CPU.Using the ground truth field maps from the simulated dataset, PyHySCO achieves the lowest average field map relative error, 14.48%, compared to 19.70% for HySCO and 16.36% for TOPUP.Figures 7, 8, and 9 show the field map and corrected images for one example subject from each dataset.The results of the methods are similar, and the resulting field maps are comparable to those of the existing tools, HySCO and TOPUP, while PyHySCO is considerably faster.

Discussion
The PyHySCO toolbox accurately and robustly corrects susceptibility artifacts in EPIs acquired using the Reverse Gradient Polarity acquisition.In numerous experiments with real and simulated data, it achieves similar correction quality to the leading RGP toolboxes TOPUP and HySCO while having a time-to-solution in the order of timings reported for pre-trained deep learning approaches.Compared to the latter class of methods, it is important to highlight that PyHySCO does not require any training and is based on a physical distortion model, which helps generalize to different scanners, image acquisition parameters, and anatomies.PyHySCO's modular design invites improvements and contributions.The toolbox is based on PyTorch, which provides hardware support and other functionality, including automatic differentiation.In our experiments, correction quality is hardware and precision-independent, but a considerable speedup is realized on GPUs with single precision (32-bit) arithmetic.The reduced computational time is mostly attributed to the effective use of multithreading and parallelism on modern hardware.
PyHySCO introduces a novel, fast, and effective optimal transport-based initialization scheme.The initial estimate of the field map already substantially reduces the distance between the images with opposite phase encoding directions.In our experiments, the non-smoothness of the initial field map can be corrected by applying a Gaussian blur and a few optimization steps to the full image resolution.
PyHySCO's three optimization algorithms achieve comparable correction results but have different computational costs.The ADMM algorithm takes advantage of the separable structure of the optimization problem to enhance parallelism but requires more iterations than GN-PCG.While this results in longer runtimes in our examples, the method could be more scalable for datasets of considerably higher resolution.For the relatively standard image sizes of about 200 × 200 × 132, the default GN-PCG algorithm is most effective.Both customized optimization algorithms are more efficient than our comparison, LBFGS.
PyHySCO can be interfaced directly in Python or run in a batch mode via the command line.The latter  Loss Value 6.00e07 6.08e07 6.12e07 ±1.41e07 ±1.40e07 ±1.43e07 Table 4: Details of optimization for PyHySCO optimizers LBFGS, Gauss Newton, and ADMM.For each dataset we report the average and standard deviation number of iterations, count of stopping criteria used (gradient tolerance/ loss function change tolerance/ field map change tolerance/ maximum iterations), average and standard deviation number of function evaluations, average and standard deviation number of Hessian evaluations, average and standard deviation number of inner iterations, and average and standard deviation loss value.Gauss Newton achieves a similar quality of correction with less computation than LBFGS or ADMM.makes it a drop-in replacement for other RGP tools in MRI post-processing pipelines.

Conclusion
PyHySCO offers RGP-based correction with high accuracy at the cost similar to pre-trained learning-based methods.Our implementation is based on PyTorch and makes efficient use of modern hardware accelerators such as GPUs.We show the accuracy and efficiency of PyHySCO on real and simulated three-dimensional volumes of various field strengths and phase encoding axes.Our results show that PyHySCO achieves a correction of comparable quality to leading physics-based methods in a fraction of the time.Table 5: The speed and quality of PyHySCO optimization with Gauss Newton on three different GPUs and a CPU in both single (float 32) and double (float 64) precision arithmetic.The relative improvement, loss value, and smoothness value are evaluated in double precision in all cases.Results are shown for both 3T and 7T data from the Human Connectome Project [Van Essen et al., 2012] and simulated data.There is a great speedup when calculating in single precision without losing the quality of correction, and the speedup of PyHySCO using a GPU is clear compared to the CPU.

Figure 1 :
Figure 1: The Reverse Gradient Polarity correction paradigm.Two images are acquired with opposite phase encoding directions, +v and −v.These two images are used to estimate the field map b, and the distortion correction model [Chang andFitzpatrick, 1992] is applied to obtain a corrected image I.

Figure 2 :
Figure 2: Example of one-dimensional optimal transport maps.The top left shows an example of onedimensional measures.The green signal, i +v , corresponds to an intensity pileup in I +v , while the purple signal i −v corresponds to an intensity dispersion in I −v .The red signal corresponds to the intensity of the true image.The top right shows the cumulative distributions for the measures i +v and i −v .Bottom left shows the pseudoinverses for i +v and i −v along with the pseudoinverse C −1 i half used in calculating the transport maps T + = C −1 i half • C i+v and T − = C −1 i half • C i−v , shown bottom right.
The map T+ mapping from I+v halfway to I−v is calculated as the composition of the cumulative distribution function C+v from I+v and the interpolated pseudoinverse C −1 half .The map T− mapping from I−v halfway to I+v is calculated as the composition of the cumulative distribution function C−v from I−v and the interpolated pseudoinverse C −1 half .
the form S(x) = 1/2*||x||_H**2 where H is positive semidefinite LinearOperator TikRegularizer implements Tikonov regularization term of the form Q(x) = 1/2*||x-y||**2 for given reference value y InitializeOT implements one-dimensional optimal transport initialization of field map EPIMRIDistortionCorrection evaluates objective function J(b) = D(T(I_+v, b), T(I_-v, b)) + alpha * S(b) + beta * P(b) (a) Class structure of PyHySCO loss function.The main class representing the loss function is EPIMRIDistortionCorrection.Purple classes are abstract, and blue classes are concrete.Solid arrows indicate inheritance.Dashed arrows indicate dependencies and class objects that are attributes.correction to solve for true image given field map information (b) Class structure of PyHySCO optimization.The main class defining optimization is EPIOptimize.Solid arrows indicate inheritance.Dashed arrows indicate dependencies and class objects that are attributes.

Figure 4 :
Figure 4: UML diagram of PyHySCO showing the classes and relationships for the (4a) loss function and (4b) optimization.A EPIMRIDistortionCorrection object defining the loss function is an attribute of every EPIOptimize object defining the optimization scheme.
The workflow of the PyHySCO toolbox from setup through optimization and distortion correction.

Figure 7 :Figure 8 :Figure 9 :
Figure7: Visualization of resulting field maps and images for one subject from the 3T dataset (Subject ID 211619).The first column in the top half shows the input data.The remaining columns show the results from PyHySCO using LBFGS, GN-PCG, and ADMM compared with TOPUP and HySCO.For each optimization, the top two rows are the pair of images with opposite phase encoding directions, and the third row shows the absolute difference (with inverted color) between the pair of images.The bottom row shows the field maps estimated for each method.PyHySCO achieves similar image distance and field map smoothness improvements in less computational time.

Table 1 :
[Van Essen et al., 2012]validation.LR/RL is left-to-right and right-to-left phase encoding, and AP/PA is anterior-to-posterior and posterior-to-anterior phase encoding.Further details of acquisition parameters are in[Van Essen et al., 2012].