A Simplified Crossing Fiber Model in Diffusion Weighted Imaging

Diffusion MRI (dMRI) is a vital source of imaging data for identifying anatomical connections in the living human brain that form the substrate for information transfer between brain regions. dMRI can thus play a central role toward our understanding of brain function. The quantitative modeling and analysis of dMRI data deduces the features of neural fibers at the voxel level, such as direction and density. The modeling methods that have been developed range from deterministic to probabilistic approaches. Currently, the Ball-and-Stick model serves as a widely implemented probabilistic approach in the tractography toolbox of the popular FSL software package and FreeSurfer/TRACULA software package. However, estimation of the features of neural fibers is complex under the scenario of two crossing neural fibers, which occurs in a sizeable proportion of voxels within the brain. A Bayesian non-linear regression is adopted, comprised of a mixture of multiple non-linear components. Such models can pose a difficult statistical estimation problem computationally. To make the approach of Ball-and-Stick model more feasible and accurate, we propose a simplified version of Ball-and-Stick model that reduces parameter space dimensionality. This simplified model is vastly more efficient in the terms of computation time required in estimating parameters pertaining to two crossing neural fibers through Bayesian simulation approaches. Moreover, the performance of this new model is comparable or better in terms of bias and estimation variance as compared to existing models.


INTRODUCTION
Recent advances in neuroscience technologies to study brain connectivity are providing new techniques to better understand the role of both structural and functional networks in complex neurological disorders (Bargmann and Newsome, 2014;Baliyan et al., 2016). A fundamental goal of modern neuroscience is to understand how different cortical regions interact with one another to produce observed behavior (Behrens et al., 2003;Yang et al., 2016). Diffusion MRI (dMRI) allows for identifying anatomical connections in the living human brain. These connections form the substrate for information transfer between remote brain regions and are therefore central to our understanding of brain function.
Fiber tractrography, inferred from dMRI, offers the only way to study brain structural connectivity non-invasively in vivo to reconstruct nerve fiber tracts. It has become an important tool in the study of a wide range of diseases affecting the brain, as it allows us to probe the shape and integrity of the white-matter pathways that connect the functionally-related cortical and subcortical regions (Yendiki et al., 2011(Yendiki et al., , 2016. A lot of research has focused on discovering an approach using parametrized models for extracting tissue structural information from dMRI data. An early, widely accepted model is the diffusion tensor model (DTI), which derives the orientation of a fiber within a voxel, the smallest analytic unit (Basser et al., 1994;Mori and van Zijl, 2002). Even though it is a robust model when there is only a single major diffusion direction in a given voxel, it has the inherent limitation when voxels contain more than one fiber pathways with distinct orientations. Voxels with crossing fibers account for more than 60% of the total voxels across the brain (Jeurissen et al., 2013). In order to handle the more complex situation of multiple crossing fibers within a voxel, a number of alternative models have been suggested, such as multiple tensor fitting (Tuch et al., 2002), Ball-and-Stick model (Behrens et al., 2003Hosey et al., 2005;Jbabdi et al., 2007), spherical deconvolution (Tournier et al., 2004;Anderson, 2005) and Q-ball imaging (Hess et al., 2006). The methods each have their limitations. Spherical deconvolution attempts to estimate the fiber orientation directly and assumes a fiber response function. The Ball-and-Stick method models axons as impermeable cylinders of a single radius and assumes isotropic diffusion outside the axon in the extracellular compartments. Although the spatial profiles from both the Ball-and-Stick method and spherical deconvolution appear similar, conceptually there is quite a difference. Geometric features of the signal itself are used to constrain the Ball-and-Stick model. Extensions to improve the Ball-and-Stick model have been proposed, such as the ball and racket method described by Sotiropoulos et al. (2012). This incorporates a Bingham distribution to model fanning of the tracts, common in the corona radiatus. An alternative to the Ball-and-Stick method is the composite hindered and restricted model of diffusion (CHAMRED) developed by Assaf et al. (2004). It is a hybrid model that uses diffusion tensors to model the extracellular diffusion as a hindered space and the intracellular diffusion is modeled as a restricted cylinder. This method has also been extended by Zhang et al. (2011) who apply a Watson distribution (a spherical analog of a Gaussian distribution) to the CHAMRED methodology to estimate the diameters of axons. The technique is limited by the assumption that axons are orientated in a single dominant direction excluding the possibility of detecting crossing or bending fibers. Here, we study and refine the Bayesian inference framework for the Ball-and-Stick model, as it is widely implemented in the tractography toolbox of the popular FSL software package (University of Oxford) and FreeSurfer/TRACULA software (Massachusetts General Hospital and Harvard Medical School).
In the situation of two crossing fibers, we discover from simulation analyses that the Ball-and-Stick model faces significant challenges in estimation of the fiber-related parameters, when we apply the Bayesian framework proposed by Behrens . For example, convergence of the sampling scheme of Markov Chain Monte Carlo (MCMC) procedure can be slow in the Ball-and-Stick model, because the sampling and simulation is conducted in a high-dimensional parameter space, and within a computationally challenging non-linear regression modeling context. This can lead to long sampling chains without convergence. The study of a simplified version of a Ball-and-Stick model is motivated by the need for reducing complexity of the non-linear regression model, more robust estimation of the parameters related to neural fibers, and improved efficiency in MCMC simulations.
Focus on the two-fiber model has practical justification. It is difficult for 3 or more fiber directions to be well separated, except under limited scenarios, and even then, it can be difficult to distinguish 3 fiber models from isotropy. Hence, accurately estimating 3-fiber models is not often feasible, especially with a limited number of gradients.
We propose a simplified version of Ball-and-Stick through a two-stage approach. First we develop a set of simultaneous equations for estimating "nuisance" parameters (i.e., those that are not directly related to the features of neural fibers). Further, we simplify this non-linear regression model through rotation by exploiting geometric properties of the surface of the intensity model shape.
This simplified model is validated by comparing fiber-related parameter estimation with the full Ball-and-Stick model in an efficient, adaptive MCMC estimation framework. Simulation analysis and an in-vivo human data example demonstrate the improved computational efficiency and estimation accuracy of the proposed simplified model.
We also explore how to dramatically improve computational efficiency by applying voxel-specific dynamic stopping rules for MCMC sampling based on convergence monitoring statistics. Finally, methods for determining between one-vs. two-fiber model fit and switching models within a single MCMC run are studied. Together, our approaches hold the promise for vast improvements in computational times for generating probabilistic fiber tracking, which currently is extremely slow.

Introduction of Ball-and-Stick Model
A local partial volume model for diffusion signal attenuation (e.g., restriction of water diffusion), the Ball-and-Stick model, was proposed by Behrens et al. (2003). The model relates the local fiber structure to the diffusion signal by assuming different components within each voxel. The diffusion-weighted MR signal (S i ) is split into multiple components for each fiber orientation, and a single isotropic component.
The Ball-and-Stick model is used to retrieve componentspecific information (free and restricted diffusion) from the signal decay in diffusion MRI experiments. The fully parameterized model can be fitted to dMRI data (similar sampling scheme as for DTI). This model is notably implemented in BedPostX and available in FSL (http://fsl.fmrib.ox.ac.uk/fsl/ fslwiki/FDT).

Formula and Notation of Ball-and-Stick Model
The predicted signal for each diffusion-weighted measurement at each voxel is: where S i is the observed signal intensity along the i-th diffusionweighting gradient; S 0 is the signal intensity without diffusion gradients; b i is the diffusion weighting factor for the i-th gradient; f k is the volume fraction of the k-th fiber; (1-L k=1 f k ) is the volume fraction for isotropic tissue; d is a diffusivity constant; r i is the directional unit vector of the i-th diffusionweighting gradient; R k is a rotation matrix depending on (θ k , φ k ): the spherical coordinates for elevation angle and azimuth angle, respectively.
In the model, the noise is modeled separately for each voxel as independent identically distributed Gaussians with a mean of zero and standard deviation of σ across acquisitions . Note that the commonly-used noise distribution for MRI signal will converge to a Gaussian distribution asymptotically (Gudbjartsson and Patz, 1995) in the context of high signal to noise ratio. In Equation 1, the formula consists of two major components. The first component in the bracket, the "Ball" component, S 0 1 − L k=1 f k ) exp (−b i d , represents the part of the i-th diffusion signal intensity attributable to the isotropy (e.g., free water). The second component, the "Stick" component, , denotes the contribution attributable to fiber tracts.
Instead of estimating all the parameters at once with nonlinear regression under an MCMC framework, a proposed simplified model will reduce the number of parameters estimated through MCMC and non-linear regression. Estimation approaches will be derived for d and fiber-specific parameters (e.g., f k , θ). As is commonly applied, note that a good estimator for S 0 is the average of the diffusion signal values when b i is equal to 0.

Reformulation of the Ball-and-Stick Model
The reformulation of the Ball-and-Stick model concentrates on the "Stick" component of diffusion attenuation, (Behrens et al., 2003;Dell'Acqua et al., 2007). Recall that the parameters that reflect the k-th neural fiber orientation is (θ k , ϕ k ) in spherical coordinate system, where θ refers to the elevation angle, which measures the angular separation between the corresponding directional vector and XY-plane; ϕ refers to the azimuthal angle, which measures the angular separation between directional vector and XZ-plane. It can also be denoted in the Cartesian coordinate system, so that the notation for the corresponding unit vector direction is t k = cosθ k cosϕ k cosθ k sinϕ k sinθ k T or t i,x t i,y t i,z T . Also, let r i be the unit directional vector of the i-th gradient, r i,x r i,y r i,z T = cosθ i cosϕ i cosθ i sinϕ i sinθ i T .
The dot product r i · t k represents the projection of i-th gradient onto the direction of the k-th neural fiber. Denote ik as the angular separation of these two directions, then r i · t k = cos( ik ) | r i | | t k | = cos( ik ). Generally, the expression of exp(−b i dr T i R k AR T k r i ) in Equation (1) can be reformulated as exp −b i dcos 2 ik , so that For example, when the i-th gradient is perpendicular to the k-th fiber, ik is equal to 90 • . This leads to Equation (2) being equal to 0, and the diffusion signal reaches a maximum with respect to the fiber.

Simplification of the Two-fiber Ball-and-Stick Model
In our discussions to follow, we focus on two-and one-fiber models, and assume that the b-value is constant across directions. Simplification is in relation to two fiber models.

Derivation of Estimators for Voxel-Specific Diffusivity and Sum of Volume Fractions
Spherical mean techniques enable us to generate an equation based on observed diffusion data, that also involves diffusivity coefficient d and the value of the sum of fiber volume fractions, f k . Importantly, this is achieved while averaging out the effect due to fiber dispersion (Kaden et al., 2016), so that the equation should be fairly stable and minimally affected by noise. It is based on the insight that for a fixed b-value, the spherical mean of the diffusion signal over the gradient directions does not depend on the fiber orientation distribution. Specifically, the mean signal is invariant with respect to the specific fiber orientations within a voxel, when all other sequence parameters S 0 , d, f, and the diffusion weighting factor b ≥ 0, are fixed.
Suppose we have a fixed set of L fibers, and gradient directions are uniformly spread across the sphere. We then derive the expected intensity value in this setting, which depends on d and f k , but not the fiber orientations. Let represent the angular separation value between a gradient and fiber orientation. In the uniform gradient direction setting, the distribution function of values across the sphere is of the form f( )= 1 2 sin , across the range of values in (0, π). Note exp(−bdr T i R k AR T k r i ) can be reformulated as exp −bdcos 2 ik , as in Baliyan et al. (2016). By the integration across all possible gradient directions (i.e., values) for each fiber orientation, we can derive Equation (3) in the setting of the Ball-and-Stick model Equation (1) to obtain an average theoretical mean intensity value that can be approximately equated with the empirical mean: given that Note that the spherical mean of diffusion intensity derived in Equation (3) does not depend on the fiber orientations. We will thus use Equation (3) as a basis for estimating parameters, f k and d, in the Ball-Stick model outside of the non-linear regression framework. For that, we still need a second equation for estimation.

Estimating the Directionality of the Longitudinal Axis
To simplify the estimation of the Ball-and-Stick model (Equation 1), we rely on a feature of the diffusion intensities that becomes clear as one visualizes their corresponding surface of intensities, as in Figure 1. Given a voxel containing two fibers, we define the longitudinal axis to be the axis perpendicular to the two fibers, and denote it by r ı . The identification of r ı is critical for deriving a second simultaneous equation for estimation. It captures two major geometric features of the 3D surface of diffusion intensity values (see Figure 1): (1) The maximum of diffusion-weighted signal lies in the direction of r ı . (2) Knowing its direction can be used to align the hyperplane of two neural fibers onto the X-Y plane through rotation, which can further reduce the complexity of angular parameterization of the orientations of the two fibers. Use of the first feature allows us to derive a needed second simultaneous equation for f k and d.
First, we identify the direction associated with the maximum of the diffusion-weighted signal, the orientation of r ı . Since r ı is perpendicular to both fibers simultaneously, the angular separation between r ı and the two fibers, denoted as ı1 and ı2 , are 90 • , so that cos( ı1 ) = cos( ı2 ) = 0. Recall in Equation 1 that R 1 is the rotation matrix of the orientation of the first neural fiber, and R 2 is that of the second fiber. Then we have r T l R 1 AR T 1 r T l = cos 2 ( ı1 ) = 0, r T l R 2 AR T 2 r T l = cos 2 ( ı2 ) = 0 based on Equation (2), where r T l R 1 AR T 1 r T l and r T l R 2 AR T 2 r T l reach their minimum simultaneously. The minimum of this expression further leads to the maximum possible intensity value on the surface, when it is substituted into Equation (1). Given b ≥ 0, we thus have: By solving the set of Equations (3-4), we can obtain estimators for d and f k outside of the non-linear regression framework.
Secondly, the plane of two neural fibers can be mapped onto the X-Y plane, by rotating r ı onto the Z-axis. The rotation of r ı onto the Z-axis maps the gradient directions onto a new coordinate system. Given a known direction of r ı , the alignment of r ı onto the Z-axis can be achieved by two steps: (1) Clockwise rotate Z-axis with angle ϕ ı , such that the rotated longitudinal axis is on the X-Z plane, with an angular separation of θ ı relative to the X-Y plane; (2) Clockwise rotate Y-axis by (π/2θ ı ), such that the previously rotated longitudinal axis will be mapped to the Z-axis. This procedure, a two-step matrix multiplication, results in a rotation matrix R rot . Note that we use the prime sign ( ′ ) to denote the mapping objects in the new coordinate system. For example, the directional vector of i-th gradient in the new coordinate system is denoted as Similarly, the directional vector of the k-th neural fiber in the original coordinate system, t k is mapped to the vector t The mapping into the new coordinate system is through the rotation matrix R rot : sinθ l cosϕ l sinθ l sinϕ l −cosθ l −sinφ l cosϕ l 0 cosθ l cosϕ l cosθ l sinϕ l sinθ l   where r ı = [cosθ l cosϕ l cosθ l sinϕ l sinθ l ]. Also note that we can map the vector in new coordinate system back to the original coordinate system: With the hyperplane of two neural fibers being rotated onto the X-Y plane, the elevation angle θ ′ k relative to the X-Y plane is equal to 0. In this case, t ′ k = cosϕ ′ k sinϕ ′ k 0 T , so that the orientation of the k-th fiber can be denoted by one parameter, ϕ ′ k . Thus, the complexity of denoting the orientation of two fibers can be reduced from four parameters (θ 1 , ϕ 1 , θ 2 , ϕ 2 ) to two parameters (ϕ ′ 1 , ϕ ′ 2 ). After the acquisition of estimated values of (ϕ ′ 1 , ϕ ′ 2 ) in Equation 5, we can convert angles back into the original coordinate space.

Formula of Simplified Model
In the new coordinate system, the relative angular separation between two fibers will remain the same as that in the original coordinate system. For example, the angular separation between the i-th gradient and the k-th neural fiber in the original coordinate system, ik , will be the same as ′ ik in the new system, such that cos 2 ( ik ) = cos 2 ( ′ ik ), which derives (r i · t k ) 2 = (r ′ · t ′ k ) 2 based on Equation (2). Furthermore, we can reformulate the "Stick" component of diffusion intensity attributable to the fiber tracts, 2 k=1 f k exp(−bdr T i R k AR T k r T i ), proportional to the baseline intensity S 0 , as Thus, in the new coordinate system, we obtain the simplified model by reformulating Equation 1 as In the simplified model (Equation 5), we have demonstrated below that the nuisance parameters that are not directly related to the features of neural fibers, S 0 , d, f k , r l , can be estimated outside the framework of MCMC estimation. We apply the "hat" sign to denote the estimators of parameters (e.g., S 0 , d, f k , r l ). Thus, the number of parameters can be reduced from 9 in the full Ball-and-Stick model (Equation 1) to 4 in the simplified model (Equation 5). As we will establish, the simplification of the non-linear regression model has important ramifications for computational efficiency and can also lead to improved statistical accuracy in fiber-related parameter estimation.

Use of Spatially Smoothed Data
The purpose of spatial smoothing is to eliminate the effect of noise in the intensity values through spatially weighted averaging. Similar to the notation in the Ball-and-Stick model (Equation 1), S i denotes the signal intensity along the i-th gradient direction in Equation (6). For illustration, we assume 64 observed gradient directions. Note that the estimation of nuisance parameters is prone to the effect of noise from the diffusion intensity measurement. To have a robust estimator of nuisance parameters, we propose smoothing each data point through a von-Mises smoothing kernel, centered on the respective gradient direction. von-Mises distributions are used in directional data applications (see Equation 6) and are analogous to wrapped normal distributions for angular data. Let x ij denote the angular distance between directions i and j. S j,smooth is the smoothed signal intensity for the gradient direction j. The denominator is to assure that the weights applied to the S i sum to 1.
where f vonMises x ij , κ = e κcos x ij 2πI 0 (κ) , and I 0 (κ) is the modified Bessel function of order 0. In our simulation analysis, the gradient directions are 64 evenly distributed directions, as in Jones et al. (2002). We will demonstrate that spatially smoothing the diffusion data derived from noisy observed data can improve estimation of the nuisance parameters, if a proper smoothing kernel is applied. The concentration or "spread" of a von-Mises distribution is determined by a kappa (κ) parameter. It is thus of interest to identify appropriate κ values, which control the extent of smoothing, that ultimately improve estimation.
One facet of estimation that can be improved with smoothing is the identification of the longitudinal axis, which involves selecting the gradient direction with the largest (smoothed) intensity value. To improve estimation bias that could result from the limited granularity of the observable gradient directions, we add hypothetical gradient directions from which observed data is not actually acquired but estimated through smoothing of observed values. In our cases, this involved 64 "extra" directions, although another number of discrete directions could have been added as well (see Figure 2). The hypothetical directions are generated by the rotation of the whole set of observed gradients, and guided by the principle of maximization of the minimum distance between hypothetical and observed directions. We found for our simulations that the largest angular separation from a gradient to any neighboring direction was <10 • . These hypothetical directions thus expand the estimation space of possible directions for r l , and improve resolution. A smoothed value is obtained as a weighted linear combination of all the observed data. The weights vary and depend on the angular separation between a given (hypothetical) direction, and the other observed directions around it. This is determined by the relative density values of the von-Mises distribution, with a given concentration parameter κ (see Equation 6). As κ decreases, the spread of the distribution is larger, indicating that the neighboring signal intensity data are assigned more weight when smoothing.
By the definition of r ı and Equation (4), we can readily derive Equation (7) as the basis for obtaining r l . The estimator r l is determined as the direction of the maximal smoothed signal value among all the observed and hypothetical gradient directions. The accuracy of estimator r ı does rely on selection of the smoothing kernel.
Following Equation (4), it also is of interest to accurately distinguish the maximum intensity value itself, and not just the longitudinal axis direction associated with it. Note that the estimation of diffusivity (d) and sum of fiber volume fraction ( f k ) depends on the maximum intensity value in Equation (4). This is a second facet in estimation that can be improved with smoothing.
To note, the estimators on the magnitude of maximum signal (max(S)) and the orientation of maximum signal (r ı ) appear to, respectively have improved accuracy by choosing different concentration parameters for smoothing, even though both estimators are related to the maximum signal value. To differentiate the smoothing parameters, we denote this second von-Mises smoothing kernel with kappa parameter κ 2 .

Guidance on Smoothing Kernel Selection
Extensive simulations have been conducted in Yang (2018) to assess the effect of smoothing kernel selection on nuisance parameter estimation, and more importantly, on fiber-related parameters. These were done assuming 64 gradients and signal to noise ratio of 20. Other signal to noise ratios were considered in Yang (2018), and the results give similar conclusions about relative performance. In summary, a good empirical choice of kappa value, κ, is between 35 and 70 for estimation of max(S), which in turn impacts the estimation of d, f k . In our simulation analysis, we discover that the error between smoothed maximum diffusion data value across gradients, max(S j,smooth ), and the true maximum diffusion data value, max(S), is <5% [see (Yang, 2018)]. For estimation of r ı , a good empirical choice of κ 2 value will be around the range between 10 −4 and 0.1. Recall that the estimate of r ı determines the rotation of an estimated hyperplane for two fiber orientations. Accuracy depends on angular separation of the two fibers. For instance, at 90 • , estimation of r ı is highly accurate with κ 2 = 0.1, with little to no bias and standard deviation of the discrepancy from the true direction of around 1.5 • across simulations. For instance, even at 50 • of separation, bias is around 2.7 with standard deviation of discrepancy of 5.8. We adopt κ 2 = 0.1, and consider it a reasonable value across different angular separation scenarios.
For each of these parameters, a proposal distribution q(Y| t−1 ) (see Equation 8) is used to generate a new candidate parameter value Y. t is then set to Y with a probability α( t−1 , Y) = min(1, π (Y) π ( t−1 ) ) (e.g., π (Y) is posterior likelihood given Y, the other most recent parameter values, and observed diffusion data). Otherwise, reject the value Y, and set t to the previous value.
4. Update σ −2 with a Gibbs sampling step. The updating of σ −2 posterior distribution is conditional on updated values of f 1 , ϕ ′ 1 , ϕ ′ 2 . 5. The proposal variance ε 2 for each parameter is adapted every 50 iterations, according to the acceptance rate among those iterations (<0.44, then decrease ε by the factor of δ; > 0.44, increase ε by the factor of δ. Note that δ = min[exp(0.01), exp(1/c)], where c is the square root of the modulus of t by 50) (Green, 1995). 6. After the MCMC iterations have finished, f 2 is derived as f k minus f 1 , and (0, ϕ ′ k ) is reformulated back into (θ k , ϕ k ) by re-rotating the angular coordinates to the original coordinate system, guided by the estimate of r ı .
With the full Ball-and-Stick model, we implemented the approach in Behrens et al. (2007) that also fully aligns both methods. This includes adopting the same prior specifications across models when applicable, and the same adaptive MCMC framework. This facilitates comparison between the simplified and full models in terms of computational performance and estimation accuracy of the fiber-specific parameters.
We record the ongoing chains of sampled values until stopping, and the collection of these sampled values can be reflected in histograms and time-series plots (e.g., see Appendix).
Estimates of posterior medians and standard deviations are obtained from sampled values after the burn-in period. These estimates will be used to describe the posterior distributions of the parameters of interest (e.g., f 1 , f 2 , ϕ ′ 1 , ϕ ′ 2 ) when evaluating the accuracy and precision of estimates.

Simulation Results of Fiber-Specific Parameters
The performance of estimation on the fiber-specific parameters in two crossing fiber scenarios will now be compared between the simplified model and the full model. Again, for the simplified model, the estimated values of the non-fiber-specific parameters S 0 , d, f k will be substituted in. To assess the accuracy of estimation of fiber-specific parameters in a MCMC framework, we extract every 10th sampled values from the last half of MCMC chains with 100,000 iterations. For each scenario of parameters being studied, 100 data sets are simulated. Estimates are derived from each data set, so that distributions of these estimates can be generated. Distributions of bias in the posterior medianbased estimates can thus be visualized for comparison with the corresponding ground-truth parameter values. Again, we set ground-truth model parameters to be S 0 = 400, b = 1500 s/m 2 , d = 1/1500 m 2 /s, f 1 = 0.4, f 2 = 0.5, θ 1 = 0, θ 2 = 0, φ 1 = 60 • , φ 2 = 120 • . We also set κ 2 = 0.1. The number of observed gradient directions is 64, as in Jones et al. (2002).

Results for Volume Fractions of Individual Fibers
In Figures 3A,B, the violin plot (Hintze and Nelson, 1998) displays the accuracy level of f 1 and f 2 under different model settings. Violin plots are a combination of a box plot and a kernel density plot. Specifically, it starts with a box plot, where the thick bar represents the IQR of the bias with the white point "median" in the middle, and the thin bar "whiskers" are drawn to 1.5 × IQR below the 1st quartile or above the 3rd quartile. It then adds a rotated kernel density plot to each side of the box plot.
Generally, the distribution of bias that is centered around 0 and with relatively small variance has a higher accuracy level. The violin plots in yellow and red display the distribution of bias of parameter estimates (medians) of f 1 and f 2 in the simplified models, based on estimates of f k that are derived from different smoothing kernels (i.e., different κ values). The plot in green shows the related distributed bias in the full model. When the κ value is in the range of ≥20, the bias and variance of estimation error appear smaller in the simplified model than the full model. In Table 1, the estimation bias (mean ± sd) for f 1 and f 2 demonstrates a more accurate estimate in the simplified model 0.0026 ± 0.0719 and 0.0054 ± 0.0788, respectively (κ = 50), while the corresponding bias is −0.053 ± 0.1344 and −0.0842 ± 0.1496 in the full model.
In Figures 3C,D, the violin plot displays the precision level of f 1 and f 2 under different model settings. Posterior standard deviation in the simplified model (κ ≥ 20) is smaller compared with that in the full model. In Table 1, the posterior distributions of f 1 and f 2 demonstrate a more precise estimate in the simplified model 0.0252 ± 0.0094 (κ = 50), while the corresponding estimation bias is 0.054 ± 0.0835 and 0.0585 ± 0.0833, respectively in the full model.
Among the simplified models, we noticed that the performance in estimation of f 1 ( Figure 4A) and f 2 ( Figure 4B) is closely related to the performance of estimator of f k , in that a more biased estimate of f k will lead to a more biased estimate of f 1 and f 2 . Across varying κ values, the estimates of volume fractions with respect to κ equal to 50 obtain the smallest bias and variance, which is consistent with the empirical suggestions for κ value selection in the estimation of max(S), d, and f k . Similarly, beyond 60 degree fiber angular separation, we also found the estimation of f 1 and f 2 with κ equal to 50 generally performs at least as well as those for other κ values. Generally, when the fiber angular separation is larger, the estimation of parameters is "easier, " so that the estimates of f 1 and f 2 are more accurate and precise. For instance, when the angular separation between fibers reaches its largest value of 90 • , the estimation bias of f 1 and f 2 has smallest mean (relative to 0) and standard deviation.

Results for Orientations of Individual Fibers
Similarly, in Figure 4, the violin plots in yellow and red display the distribution of bias from parameter estimates (medians) 1 | Fiber-specific parameters (f 1 , f 2 , θ 1 , φ 1 , θ 2 , φ 2 ) and angular separation bias of two fibers in 64 gradient directions across 100 simulations. of θ 1 , ϕ 1 , θ 2 , ϕ 2 in the simplified models, derived from different smoothing kernels (e.g., different κ values). The plot in green shows the related distribution of bias in the full model (Figures 4A-D). We define angular bias as the difference in estimated and true fiber direction in their shared hyperplane. Note that this measure of bias incorporates information about the estimation of both θ, ϕ. We assess angular bias of fiber estimates in Figures 4E-F and Table 1.

Estimation bias (mean ± sd)
In comparison of the results, the simplified and full models demonstrated a similar accuracy level of performance in estimating the fiber orientations (Figures 4E,F and Table 2). Bias in estimated fiber orientations for the simplified model are very close or slightly better than for the full model, when the κ value is ≥20. The performance of estimates for angular separation of two fibers indicates similar results between the simplified model and the full model (see Table 1). Similarly, beyond the scenario of 60 • fiber angular separation, we also found comparable patterns between some simplified model setting and the full model under other scenarios (e.g., 90 • , 30 • fiber angular separation). The simplified model demonstrates a similar precision level of performance in estimating the fiber orientations (Figures 4G-J and Table 2), in that the MCMC posterior standard deviation on fiber orientations in the simplified model are also very close or slightly better than the full model, when the κ value is at least 20.

Computational Efficiency of MCMC Simulation
The computation of simulation analysis was run on a server computer with the following: CPU Intel Xeon E5-2699 v5 32 cores, 128 GB RAM, Linux-based CentOS 7 environment. In terms of the computing efficiency, the simplified model required an average of approximately 0.65 min per 100,000 iterations as in 3.5.3, when the computation of 20 simulated samples was run simultaneously over 10 cores. The full model required an average of approximately 9.5 min per 100,000 iterations under the same conditions. The simplified model thus offers a great reduction in computing times.

Real Data Examples and Three Fiber Simulated Scenarios
To obtain an in-vivo human data example, we used freely available neuroimaging data that was acquired by the MGH HCP team for the Human Connectome Project (https://db. humanconnectome.org/data/projects/MGH_DIFF). The study subject ID is MGH 1001, a female aged between 40-44 years old. A data analysis was conducted in FSL following their processing pipeline guide, using BET, EDDY_CORRECT, DTIFIT, and BEDPOSTX. In order to evaluate the full and simplified models with real data, we selected an area of 36 contiguous voxels in the corona radiata region as test samples. The corona radiata is an area that contains many crossing fibers. See Figure 5, which indicates location of the region of interest being analyzed.
To assess how our 2-fiber simplified model behaves in such situations, we also estimated 3-fiber models using BEDPOSTX for comparison. To indicate the presence of 3 fibers, we examined the volume fraction of the third fiber (f 3 ) across the entire   brain and selected the 99th percentile value (f 3 = 0.1) as the threshold for indicating the presence of the third fiber. According to this criterion, 20 of the 36 voxels were likely to contain 3 fibers. As there is no gold standard for real data, we simply present the results (see Supplementary Table 1, where specific voxels are identified, and voxel-level estimates are listed). We also compute respective angular differences between the first two estimated directions of the 3-fiber BEDPOSTX model and the two directions estimated by the simplified model. See Figure 6, where the voxels are in the region of interest of Figure 5. For the apparent 3-fiber voxels, the median angular and volume fraction f-value difference in estimates for the first of the directions are 10.25 • and 0.014; for the second direction they are 11.30 and 0.033. For the apparent 2-fiber voxels, estimated median angular and f-value differences are 2.87 • and 0.0070, and 4.89 • and 0.0086. Overall, median differences with the first two angles We also considered three simulated 3-fiber scenarios, 1,000 replications each with signal-to-noise ratio equal to 5% of S 0 = 400: 1) ϕ 1 = 0, θ 1 = 0, ϕ 2 = 90, θ 2 = 0, ϕ 3 = 45, θ 3 = 45; f 1 = 0.3, f 2 = 0.25, f 3 = 0.2.
We used the BEDPOSTX 3-fiber model and simplified 2-fiber model for estimation. Given estimated directions from each method, we considered all permutations of pairing the estimated directions with the true fiber directions in the respective simulations, and selected the permutation with smallest average angular difference between paired estimated and true directions. Angular and f-value differences, as well as the number of permutations for the simplified model that a true direction is associated with an estimate, are listed in Table 3. Note that for both methods, the 3-fiber scenarios that were studied present difficult estimation problems, and that bias in estimation is large. An exception is in Scenario 3, where the third direction is consistently estimated in a fairly accurate manner by the simplified model.

Impact of Increased Gradient Directions (64 vs. 128 Gradients)
A simulation study was implemented to demonstrate how the increase in the number of observations from gradient directions impacts the estimation results. This has practical importance, as it is expected in the near future for standard clinical DWI acquisitions to have increased number of gradients. We simulated a set of 128 evenly distributed gradient directions and followed the same estimation procedures and parameter settings to obtain the parameter estimates as we did previously with simulations with 64 gradient directions. The performance of estimation on the fiber-specific parameters is compared between 128 vs. 64 gradient directions, respectively between the simplified model and the full model. Given 128 gradients, the bias and variance of fiber-related estimates are smaller in the simplified model than the full model, when the κ value or estimating is in the range of 20 or higher (see Figure 7). Within the results from the simplified model, the estimates of volume fractions with respect to κ between 35 and 70 is likely to obtain the smallest bias across varying κ values. When the value of κ was chosen as 20, 35, 50, 70 or even without smoothing, the estimation error on volume fractions in the simplified model was generally closer to 0 than that in the full model.
In terms of fiber orientation, the simplified model demonstrates a similar level of performance in estimation. The biases of estimates of fiber orientations in the simplified model are very close and slightly improved relative to the full model when the κ value was 20 or higher. Similarly, as with the trend with volume fraction estimation error, the estimates of fiber orientations with respect to κ between 35 and 70 obtain the smallest bias across varying κ values.
Comparing estimation bias with 128 vs. 64 gradients (  indicates a more accurate and precise estimate. While bias also is smaller in estimating the full model with more observations from added gradients, the magnitude of improvement was not as great as with the simplified model (κ = 50). Within the results of the simplified model, an empirical suggestion on the choice of κ value is between 35 and 70 to obtain smaller bias and standard deviation of estimation error, no matter if the number of gradients is 128 or 64 (Tables 1, 4).
As we investigate other challenging estimation scenarios with two crossing fibers (e.g., fiber angular separation of 40 • ), we have seen that the MCMC chains in the simplified model have a much higher likelihood of accurate convergence with 128 gradients. The simplified model is clearly more stable in such scenarios.

Dynamic Stopping Rule in MCMC Estimation
The MCMC algorithm has served as a powerful approach for Bayesian estimation of DWI modeling. This method involves simulating from a complex and generally multivariate target distribution with respect to the parameters in the models. Markov chains are generated and stationary distributions of sampled parameter values are targeted (Cowles and Carlin, 1996). In application of MCMC algorithms, a default number of iterations can be overly large to guarantee a sufficient length for burnin period, and to attain stationary convergence. However, this can result in an unnecessary computation load. This issue is compounded due to the large number of voxels. We thus suggest applying convergence criteria such as the Geweke diagnostic (Green, 1995) to dynamically detect the stationary convergence along Markov chains, which provides guidance on when to stop sampling at the voxel level. Further technical discussion and an example that illustrates great computational savings through reduced number of iterations for MCMC are given in the Appendix. An important practical implication is that the simplified model not only requires less computation for a given set of iterations, as was seen earlier, but also requires a lesser number of iterations. In conjunction, these computational savings are multiplicative.

Model Selection: 1-Fiber vs. 2-Fiber Simplified Model
We also have considered how to identify 1-fiber vs. 2-fiber model fit within a voxel during a same MCMC run. This is similar in purpose as the Automatic Relevance Determination (ARD) sparsity prior in Behrens et al. (2007), which identifies whether to continue to include fiber components within a same MCMC run. An advantage of these approaches is that model fit can be conducted without having to conduct MCMC separately per model. We can efficiently identify sampling chain behaviors that indicate over-fitting of a 2-fiber model with 1fiber data without the need to add to model complexity, as ARD priors do. One-fiber models converge in MCMC estimation  4 | Estimation bias of the fiber-specific parameters (f 1 , f 2 , θ 1 , ϕ 1 , θ 2 , ϕ 2 ) and angular bias of two fibers in 128 gradient directions across 100 simulations. quite quickly relative to 2-fiber models, so in conjunction with dynamic stopping of MCMC sampling at the voxel level, computation can be reduced even further. Details are given in the Appendix.

DISCUSSION
The key step for our simplified model is Equation (5), which allows for focus on fiber-specific parameters that reduce the parameters in the non-linear regression that are to be estimated in a MCMC framework. Non-fiber-specific parameters can be estimated reliably through simultaneous equations for estimation that mainly take advantage of novel statistics based on signal shape across gradient acquisitions. Specifically, the estimated maximum (max(S)) and spherical mean (S) of signal shape allow for a practical and valid solution for estimating the parameter d and f k , and for estimating the longitudinal axis (r ı ) of the signal surface shape. The latter is used to guide the rotation of the hyperplane of two fiber orientations to further reduce angular parameterization. Spatial smoothing of intensity values improves estimation performance. After the replacement with the non-fiber-specific parameter estimators, only three parameters are then to be estimated in the MCMC framework under the scenarios of two crossing fibers.
The results of fiber-specific parameter estimation consistently demonstrate the simplified model approach be more accurate, precise and efficient. From our extensive simulations across different data sets, we see that fiber parameter estimation with the proposed approach is apparently unbiased, or close to unbiased. Nuisance parameter estimation also is apparently unbiased with relatively small error (Kaden et al., 2016). Simulations of the complete two-stage procedure give us a sense of the variability in estimation, in terms of the resultant fiber parameter point estimates from Bayesian analysis (e.g., Figures 4, 5). Since the proposed estimation process is (close to) unbiased and can have less variability, it is relatively attractive. First, the estimates of volume fraction f 1 and f 2 in a simplified model setting (e.g., with smoothing parameters κ = 50, κ 2 = 0.1) can always outperform that in the full model setting regardless of the angular separation ϕ fibers , in that the mean bias can be slightly reduced and the precision level can improve by more than 2-fold (smaller standard deviation of estimation error). Similarly, the estimates of fiber orientation can also be more accurate and precise in a proper simplified model. Importantly, we observed that the computational run time for the simplified model can approximately be 14-15 times faster than that in the full model for the same number of MCMC iterations. This does not even take into account that the convergence within a single chain is much faster in the simplified model, in terms of needing a fewer number of iterations before convergence is attained. This simplified model can require up to 10 times less iterations for stopping. With dynamic stopping of MCMC chains per voxel, these reductions in computation times are multiplicative. The issue of computational efficiency is critical as current probabilistic fiber tracking that relies on voxel-level MCMC estimation of the full Ball-and-Stick model is extremely slow and time-consuming.
Since we are conditioning on the estimated nuisance parameter values, the Bayesian estimation process for the fiber parameters does not reflect the uncertainty in nuisance parameter estimation. In terms of using the posterior distributions for probabilistic fiber tracking, we do note that the median values of the resultant posterior distributions will often be close to the associated true value, due to the apparent lack of bias in point estimation. Smaller posterior variances will lead to more of the sampled values being close to the true value as well, and reduced variability in fiber tracking. So, we think it will be attractive to conduct probabilistic fiber tracking with the proposed MCMC-generated posteriors distributions.
The simplified model still inherits the limitation from the full model in that it cannot reliably estimate two crossing fibers that are not far apart in terms of angular separation when the number of gradient directions is limited. For instance, when two fibers are 40 • apart ( ϕ fibers = 40 • ), the angular bias is around 15 ± 10 (mean and standard deviation from 100 simulations) given 64 gradients; however, the bias is 8 ± 7 given 128 gradients. The number of gradients in an MR diffusion image acquisition thus has an obvious impact on the accuracy and precision of parameter estimation, as well as extending the feasibility of estimation when fiber directions are not as well separated. In the near future, 128 gradient scans will be standard in clinical acquisitions. Based on our simulations, fiber direction estimation based on the simplified model holds promise for even greater relative precision improvements as the number of gradients increases.
The proposed approach clearly does not have the capability to model 3-way crossings, even if it can sometimes estimate one or two of the directions accurately in such situations. Still, the proposed approach can be useful if used in conjunction with 3 fiber models, such as in an adaptive "step down" model, once 3-way crossings can be ruled out. We illustrate such an adaptive "step down" approach from two-fiber to one-fiber models and show computational savings in that setting. Nonetheless, 3-fiber modeling can be very difficult even when explicitly acknowledging 3 fibers in a model, as illustrated in the simulations presented here.
Although not explored here, this approach can be extended to multi-shell data. One possible approach with multi-shell data could be to estimate the d and f k parameters in parallel based on respective samples with a specific b-value, and then pooling the estimates, such as by respectively, averaging them.
The MCMC estimation algorithms can then be implemented as in section Adaptive MCMC estimation of simplified and full Ball-and-Stick models, while recognizing different b-values as in Equation (1).

CONCLUSION
In summary, a simplified version of the Ball-and-Stick model is proposed. By reducing the parameter space dimensionality in the non-linear regression estimation framework, the computing time in the simplified model can be shortened dramatically. We believe the overall time savings will be tremendous as we transition to implementation at the whole brain level. Meanwhile, the accuracy and precision of estimating the fiber volume fractions and fiber orientation can also be improved with less complex and numerically simpler non-linear regressions. Future consideration will be given to Laplace approximations of the posterior distributions associated with fiber parameters, which is more feasible with the reduced dimensionality. We will also explore whether some of the ideas applied here can be extended to 3-fiber models, to reduce parameterization of corresponding non-linear regression models.

ETHICS STATEMENT
The human subject data used in this study was obtained from a freely available MRI database, the Human Connectome Project. Their study was conducted according to Good Clinical Practice guidelines and the Declaration of Helsinki. Ethical approval was obtained from their Institutional Review Board before commencing subject enrolment. Written informed consent was obtained from all subjects and/or their authorized representatives before protocol-specific procedures were carried out. Ethics approval from our local ethics committee was not required to analyze this anonymized dataset.

AUTHOR CONTRIBUTIONS
SY: study design, analysis, interpretation of data, preparation of manuscript. KG: study design, interpretation of data, preparation of manuscript. KS: study design, interpretation of data, preparation of manuscript. SS: study design, preparation of manuscript. SC: interpretation of data, preparation of manuscript. CT: study design, analysis, interpretation of data, preparation of manuscript.

FUNDING
This work was supported by NSF DRL 1561716.