Spherical convolutional neural networks can improve brain microstructure estimation from diffusion MRI data

Diffusion magnetic resonance imaging is sensitive to the microstructural properties of brain tissue. However, estimating clinically and scientifically relevant microstructural properties from the measured signals remains a highly challenging inverse problem that machine learning may help solve. This study investigated if recently developed rotationally invariant spherical convolutional neural networks can improve microstructural parameter estimation. We trained a spherical convolutional neural network to predict the ground-truth parameter values from efficiently simulated noisy data and applied the trained network to imaging data acquired in a clinical setting to generate microstructural parameter maps. Our network performed better than the spherical mean technique and multi-layer perceptron, achieving higher prediction accuracy than the spherical mean technique with less rotational variance than the multi-layer perceptron. Although we focused on a constrained two-compartment model of neuronal tissue, the network and training pipeline are generalizable and can be used to estimate the parameters of any Gaussian compartment model. To highlight this, we also trained the network to predict the parameters of a three-compartment model that enables the estimation of apparent neural soma density using tensor-valued diffusion encoding.


Introduction
Neuroimaging enables non-invasively measuring functional and structural properties of the brain, and it is essential in modern neuroscience.Diffusion magnetic resonance imaging (dMRI), the most commonly used imaging modality for quantifying microstructural properties of the brain, measures displacements of water molecules at the microscopic level and is thus sensitive to tissue microstructure.dMRI has been used to localize microstructural alterations associated with, for example, learning (Sagi et al., 2012), healthy development (Lebel et al., 2019), ageing (Sullivan and Pfefferbaum, 2006), neurodevelopmental disorders (Gibbard et al., 2018), and neurodegenerative diseases (Zhang et al., 2009).However, accurately inferring clinically and scientifically relevant properties of tissue microstructure (e.g., cell morphology or distribution of cell types) from the measured signals remains a highly challenging inverse problem (Kiselev, 2017).
Most dMRI data analysis methods are based on signal models that express the measured signal as a function of parameters of interest and can be fit to data by numerically minimizing an objective function (Novikov et al., 2019).An essential requirement for microstructural neuroimaging methods is low rotational variance (i.e., estimated parameters should not depend on how the subject's head is oriented in the scanner).Furthermore, it is often desirable for the parameter estimates to be independent of the orientation distribution of the microscopic structures (e.g., an estimate of axon density should not depend on whether the axons are aligned or crossing).These two requirements are often achieved by acquiring high-angular resolution diffusion imaging (HARDI) data and averaging over the diffusion encoding directions, which is referred to as "powderaveraging", a term borrowed from the field of solid-state nuclear magnetic resonance (NMR).The number of acquisition directions required for a nearly rotationally invariant powder-averaged signal depends on the properties of tissue microstructure and diffusion encoding (Szczepankiewicz et al., 2019a).Fitting models to powder-averaged signals is often referred to as the "spherical mean technique" (SMT), a term introduced by Kaden et al. (2016b).While powder-averaging enables the estimation of various microstructural parameters (Jespersen et al., 2013;Lasič et al., 2014;Kaden et al., 2016b,a;Szczepankiewicz et al., 2016;Henriques et al., 2020;Palombo et al., 2020;Gyori et al., 2021), a significant amount of information is lost during averaging.
Therefore, it may be beneficial to estimate the parameters directly from full data without powder-averaging.
In recent years, microstructural parameter estimation using machine learning (ML) has received significant attention as a potential solution to issues with conventional fitting, such as slow convergence, poor noise robustness, and terminating at local minima (Golkov et al., 2016;Barbieri et al., 2020;Palombo et al., 2020;Karimi et al., 2021;Gyori et al., 2021;de Almeida Martins et al., 2021;Elaldi et al., 2021;Sedlar et al., 2021b,a;Gyori et al., 2022;Kerkelä et al., 2022).ML models can be trained to predict microstructural parameter values from data using supervised or self-supervised learning.In the context of dMRI, a particularly promising development has been the invention of spherical convolutional neural networks (sCNNs) (Cohen et al., 2018;Esteves et al., 2018).sCNNs are SO(3)-equivariant (i.e., rotating the input changes the output according to the same rotation) artificial neural networks that perform spherical convolutions with learnable filters.They theoretically enable rotationally invariant classification and regression, making them potentially well-suited for predicting microstructural parameters from dMRI data.
This study aimed to investigate if sCNNs can improve microstructural parameter estimation.We focused on estimating the parameters of a constrained two-compartment model by Kaden et al. (2016a) regularly used in neuroscience to study human white matter in vivo (Collins et al., 2019;Voldsbekk et al., 2021;Toescu et al., 2021;Rahmanzadeh et al., 2022;Battocchio et al., 2022).An sCNN implemented according to Esteves et al. (2018) was trained to predict the neurite orientation distribution function (ODF) and scalar parameters (neurite diffusivity and density) from dMRI data.Training and testing were done using simulated data.The sCNN was compared to conventional fitting and a multi-layer perceptron (MLP) in terms of accuracy and orientational variance.The trained model was then applied to MRI data acquired in a clinical setting to generate microstructural maps.Furthermore, to highlight the fact that the sCNN and training pipeline are applicable to any Gaussian compartment model, the network was trained to estimate the parameters of a constrained three-compartment model by Gyori et al. (2021) that enables the estimation of apparent neural soma density using tensor-valued diffusion encoding (Topgaard, 2017).
2 Materials and methods

Spherical harmonics
Any square-integrable function on the sphere f : S 2 → C can be expanded in the spherical harmonic basis: where x is a point on the unit sphere, b is the bandwidth of f , l is the degree, m is the order, f m l is an expansion coefficient, and Y m l is a spherical harmonic defined as where θ ∈ [0, π] is the polar coordinate, ϕ ∈ [0, 2π) is the azimuthal coordinate, and P m l is the associated Legendre function.
The expansion coefficients are given by the spherical Fourier transform (SFT): SFT of a band-limited function can be computed exactly as a finite sum using a sampling theorem (Driscoll and Healy, 1994).Equation 1 is the inverse spherical Fourier transform (ISFT).
Since reconstructed dMRI signals are real-valued and antipodally symmetric, we use the following basis: Considering that diffusion encoding directions do not usually follow a sampling theorem like the one by Driscoll and Healy (1994) that enables SFT to be exactly computed as a finite sum, we use least squares to compute the expansion coefficients: Indexing j = 1 2 l(l + 1) + m assigns a unique index j to every pair l, m.Given f sampled at points x 1 , x 2 , ..., x npoints stored in a column vector X, the values of the spherical harmonics sampled at the same points are organized in a n points × n coefficients matrix B where gives a vector containing the expansion coefficients minimizing the Frobenius norm (Brechbühler et al., 1995).

Spherical convolution
Convolution of a spherical signal f by a spherical filter h is defined as where ê3 is a unit vector aligned with the z-axis.If f and h are band-limited, the above equation can be evaluated efficiently as a pointwise product in the frequency domain (Driscoll and Healy, 1994).The spherical harmonic coefficients of the convoluted signal y are Spherical convolution is equivariant to rotations (i.e., R(f * h) = (Rf ) * h for all R ∈ SO(3)) and the filter is marginalized around the z-axis (i.e, for every h, there exists a filter h z that is symmetric with respect to the z-axis so that f * h = f * h z ).

Compartment models
Compartment models represent the dMRI signal as a sum of signals coming from different microstructural environments (e.g., intra-and extra-axonal water).For details, see, for example, the review by Jelescu and Budde (2017).Here, we focus on models with non-exchanging Gaussian compartments following an ODF.
The signal measured along n is expressed as a spherical convolution of the ODF by a microstructural kernel response function K: where K is the microstructural kernel response function: where S 0 is the signal without diffusion-weighting, N is the number of compartments, f i is a signal fraction, b is the b-tensor corresponding to n and a b-value equal to Tr(b), : denotes the generalized scalar product (Westin et al., 2016), and D i is an axially symmetric diffusion tensor aligned with the z-axis representing Gaussian diffusion in the compartment.The training pipeline presented in this paper is applicable to any compartment model that can be expressed using equations 7 and 8. Given a different data generation method, the sCNN can be trained to predict the parameters of non-Gaussian models as well.

Two-compartment model
The so-called "standard model" of diffusion in white matter consists of a one-dimensional compartment representing diffusion inside neurites and a coaxial axially symmetric extra-cellular compartment (Novikov et al., 2019).We focus on a constrained version of the model by Kaden et al. (2016a) that enables model parameters to be estimated from powder-averaged data using the SMT.The model contains two parameters: intra-neurite diffusivity d and intra-neurite signal fraction f .Axial and radial diffusivities of the extra-cellular compartment are d and (1 − f )d, respectively.Inserting this into Equation 8gives 2.3.2Spherical mean technique Kaden et al. (2016b) observed that for a fixed b-value, the spherical mean of the dMRI signal over the gradient directions does not depend on the ODF.By exploiting this invariance, the constrained two-compartment model can be fit to powder-averaged data, denoted by S PA here, using the following signal equation (Kaden et al., 2016a): , respectively.We omit explicitly writing out the kernel signal equation to save space, but it is trivial to construct from Equation 8.

Simulations
Simulated training data was generated by evaluating Equation 7 in the frequency domain according to Equation 6.The response function values were evaluated along 3072 directions uniformly distributed over the surface of the sphere according to the hierarchical equal area isolatitude pixelisation (HEALPix) (Gorski et al., 2005;Zonca et al., 2019) and expanded in the spherical harmonics basis.Rician noise was added to the simulated signals: where S is the simulated signal without noise and X and Y are sampled from a normal distribution with zero mean and standard deviation of 1/SNR, where SNR is the signal-to-noise ratio.SNR was matched to the mean SNR in the imaging experiments.

Network architecture
Our sCNN, visualized in Figure 1, consists of six spherical convolution layers implemented according to Esteves et al. (2018) without enforcing localized filters.The network takes the expansion coefficients in the frequency domain as input and outputs the estimated ODF and scalar model parameters.The number of input channels is equal to the number of shells in data.Each spherical convolution layer is followed by a leaky (slope is 0.1 for negative values) rectified linear unit (ReLU) applied in the spatial domain.The conversion between frequency and spatial domains is done using the 3072 HEALPix directions.Spherical harmonics up to degree 16 are used in the network because the non-linearity can increase signal bandwidth.
Spectral pooling discards coefficients of the highest degrees.After the initial three convolutions, global mean pooling is applied in the spatial domain, and the resulting arrays are concatenated and passed to the fully connected network (FCN) that outputs the predicted scalar parameter.The FCN consists of three hidden layers with 128 units each.The first two layers of the FCN are followed by batch normalization (Ioffe and Szegedy, 2015) and a ReLU.The sCNN for estimating the two-compartment model parameters has 78,258 trainable parameters.

Training
Training was done over 10 5 batches of simulated data generated during training.Each batch contained signals from 500 microstructural configurations produced by random sampling (d ∼ U(0, 3 µm 2 /ms) and f ∼ U(0, 1)).ODFs were sampled from five volunteer scans.Validation and test datasets were constructed similarly, except that they contained 10 4 and 10 6 microstructural configurations, respectively, and the ODFs were sampled from different volunteer scans.Training was performed twice: with and without randomly rotating the ODFs.The ODFs in the validation and test datasets were randomly rotated.ADAM (Kingma and Ba, 2014) was the optimizer with an initial learning rate of 10 −3 , which was reduced by 90% after 50% and 75% into the training.Mean squared error (MSE) was the loss function.ODF MSE was calculated in the spatial domain.

Baseline methods
The sCNN was compared to the SMT and an MLP that takes the normalized dMRI signals as inputs and outputs the spherical harmonic coefficients of the ODF and the model parameters.The SMT parameter estimation and the subsequent ODF estimation using the estimated microstructural kernel and constrained spherical deconvolution (CSD) was done using Dmipy (Fick et al., 2019).The MLP consisted of three hidden layers with 512 nodes each.The hidden layers were followed by batch normalization and a ReLU.The MLP had 614,447 trainable parameters.It was trained like the sCNN, except ten times more batches were used to account for the higher number of parameters and ensure convergence.

Imaging data
The brains of eight healthy adult volunteers were scanned on a Siemens Magnetom Prisma 3T (Siemens Healthcare, Erlangen, Germany) at Great Ormond Street Hospital, London, United Kingdom.Data was denoised (Veraart et al., 2016) using MRtrix3 (Tournier et al., 2019) and distortion-and motion-corrected using FSL (Andersson and Sotiropoulos, 2016;Jenkinson et al., 2012).SNR was estimated in each voxel as the inverse of the standard deviation of the normalized signals without diffusion-weighting.

High-angular resolution diffusion imaging
Seven volunteers were scanned using a standard clinical two-shell HARDI protocol with two non-zero bvalues of 1 and 2.2 ms/µm 2 with 60 directions over half a sphere each.Other relevant scan parameters were the following: diffusion time (∆) = 28.7 ms; diffusion encoding time (δ) = 16.7 ms; echo time (TE) = 60 ms; repetition time (TR) = 3,050 ms; field of view (FOV) = 220 × 220 ms; voxel size = 2 × 2 × 2 mm 3 ; slice gap = 0.2 mm; 66 slices; phase partial Fourier = 6/8; multiband acceleration factor = 2. Fourteen images were acquired without diffusion-weighting, one of which had the phase encoding direction reversed to be used to correct for susceptibility-induced distortions.The total scan time was 7 minutes.Mean SNR in the brain was 50.Neurite ODFs were estimated using multi-tissue CSD (Jeurissen et al., 2014) with l max = 8.

Prediction accuracy
MSE on the test dataset is reported in Table 1.The sCNN and MLP outperformed the SMT in estimating the ODF and scalar parameters.The sCNN predicted d and f the best while the MLP was predicted the ODF marginally better than the sCNN.Both the sCNN and MLP benefited slightly from randomly rotating the training data.Figure 2 shows how prediction accuracy depends on the values of d and f .The sCNN and MLP outperformed the SMT in all parts of the parameter space.Although the largest errors with the SMT occurred for values of d and f not typically observed in the brain, ML-based approaches were more accurate for values observed in the brain (i.e., d roughly between 1 and 2 µm 2 /ms).

Rotational variance
The rotational variance of the different methods was assessed by generating signals from 10 3 random microstructural configurations rotated over 729 rotations given by the SO(3) sampling theorem by Kostelec and Rockmore (2008).No noise was added to the signals to exclude the effects of noise.The average standard deviation of the estimated parameters from the rotated data are shown in Table 2.The sCNN and SMT were much less sensitive to rotations than the MLP.The SMT had the lowest rotational variance for d, and the sCNN had the lowest rotational variance for f .However, the SMT's non-zero rotational variance was driven by low values of d or f for which the fit is unstable.For values typically observed in white matter, the SMT's estimates' standard deviation was three orders of magnitude smaller than the average.Data augmentation by rotating the input signals improved prediction accuracy for both the sCNN and MLP.
However, the sCNN was much less rotationally variant even without data augmentation than the MLP was with data augmentation.

Three-compartment model
To highlight the fact that the network and training pipeline are applicable to any Gaussian compartment models, the sCNN was trained to predict the three-compartment model parameters the same way as with the two-compartment model.Informed by the two-compartment model results, the network was trained with randomly rotated training data.d i ∼ U(0, 3 µm 2 /ms), f i ∼ U(0, 1), d sph ∼ U (0, max(d i , 0.5 µm 2 /ms)), and f sph ∼ U(0, f i ).The upper limit of d sph was chosen to correspond to a sphere with a diameter of 25 µm using the Monte Carlo simulator Disimpy (Kerkelä et al., 2020).Figure 5 shows maps that the sCNN generated from preprocessed dMRI data.

Discussion
The primary purpose of this study was to investigate whether sCNNs can improve microstructural parameter estimation from noisy dMRI data, focusing on a constrained two-compartment model widely used in neuroscience research to study human white matter in vivo.The sCNN demonstrated superior accuracy with similar rotational variance compared to the SMT, and exhibited similar accuracy but considerably lower rotational variance than the MLP that had significantly more trainable parameters.Our results show that sCNNs can offer substantial benefits over simpler artificial neural network architectures for ML-based microstructural parameter estimation from dMRI data.
We focused on comparing neural network architectures with a fixed training strategy, using the SMT as a baseline.Previous research by Gyori et al. (2022) has highlighted the significant impact of training data distribution on neural network predictions, which affects the performance of our sCNN when applied to real imaging data.We are aware of this limitation, and in future work, we aim to optimize the training data distribution.Another relevant key takeaway from the work by Gyori et al. ( 2022) is that at low SNR, ML-based parameter estimation can suffer from high bias, which manifests as maps that appear exceedingly smooth.Moreover, it is important to note the general limitation of microstructural models that deviations from model assumptions can lead to inaccuracies (Lampinen et al., 2017;Henriques et al., 2019;Kerkelä et al., 2021).
When it comes to training the sCNN, while it is crucial to sample the space of possible ODFs as ex- To the best of our knowledge, sCNNs have been used to analyze dMRI data only a few times prior to this.Sedlar et al. (2021a) trained an sCNN to predict 'neurite orientation dispersion and density imaging' (NODDI) (Zhang et al., 2012) parameters from subsampled data, and Goodwin-Allcock et al. (2022) showed that sCNNs can improve the robustness of diffusion tensor estimation from data with just a few directions.
sCNNs have also been used to estimate ODFs (Sedlar et al., 2021b;Elaldi et al., 2021).However, this study differs from the aforementioned studies in two important ways.First, our network and simulations were developed to estimate both the ODF and scalar parameters of any Gaussian compartment model.Second, we carefully compared the sCNN to the SMT, a commonly used and nearly rotationally invariant conventional fitting method, thus warranting a comparison with sCNN.Although we implemented spherical convolution layers as described by Esteves et al. (2018), other architectures also exist and warrant investigation in the context of microstructural parameter estimation.For example, the sCNNs by Cohen et al. (2018) use cross-correlation and can learn non-zonal (i.e., not symmetric with respect to the z-axis) filters, Kondor et al. (2018) developed efficient quadratic nonlinearities in the spherical harmonics domain, and the graphbased sCNN by Perraudin et al. (2019) is suitable for spherical data with very high angular resolution.
Besides optimizing network architecture, future studies should also focus on optimizing hyperparameters and especially on carefully assessing the effects of and optimizing the training data distribution.
) 2.3.3Three-compartment model Palombo et al. (2020) added a spherical compartment representing neural soma to the standard model to make it more suitable for gray matter.We use a constrained three-compartment model by Gyori et al. (2021) that uses tensor-valued diffusion encoding to make apparent neural soma imaging more feasible without high-performance gradient hardware.The model contains four parameters: intra-neurite diffusivity d i , intra-neurite signal fraction f i , spherical compartment diffusivity d sph , and spherical compartment signal fraction f sph .Axial and radial diffusivities of the extra-cellular compartment are

Figure 1 :
Figure1: Network for two-compartment model parameter prediction.The input is normalized two-shell data expanded using spherical harmonics up to degree eight.The signals undergo spherical convolutions, non-linearities, and spectral pooling to produce the predicted orientation distribution function.After the initial three convolutions, global mean pooling is applied in the signal domain, and the resulting arrays are concatenated to create a nearly rotationally invariant feature vector passed on to the FCN that outputs the predicted scalar parameter.
gradient waveforms encoding linear and planar btensors.The acquisitions with linear b-tensors were performed with b-values of 0.5, 1, 2, 3.5, and 5 ms/µm 2 with 12, 12, 20, 20, and 30 directions over half a sphere, respectively.The acquisitions with planar b-tensors were performed with b-values of 0.5, 1, and 2 ms/µm 2 with 12, 12, and 20 directions over half a sphere, respectively.Other relevant scan parameters were the following: TE = 82 ms; TR = 4.2 s; FOV = 220 × 220 ms; voxel size = 2 × 2 × 2 mm 3 ; slice gap = 0.2 mm; 66 slices; phase partial Fourier = 6/8; multiband acceleration factor = 2. Fourteen images were acquired without diffusion-weighting, one of which had the phase encoding direction reversed.The total scan time was 12 minutes.Mean SNR in the brain was 29.Mean squared error of the estimated two-compartment model parameters on the test dataset.Deep learning-based parameter estimation outperformed the spherical mean technique.The asterisk (*) refers to models trained with randomly rotated training data.The lowest values are highlighted in bold.

Figure 2 :
Figure 2: Mean squared error of the estimated two-compartment model parameters on the test dataset for different values of intra-neurite diffusivity (d) and intra-neurite signal fraction (f ).The first row (A-E) shows the results for d and the second row (F-J) shows the results for f .Deep learning-based methods outperformed the spherical mean technique in all parts of the parameter space.The asterisk (*) refers to models trained with randomly rotated training data.

Figure 3 :
Figure3: Axial slices of the intra-neurite diffusivity (A-C) and intra-neurite signal fraction (G-I) maps generated using the spherical convolutional neural network, multi-layer perceptron, and spherical mean technique.The second row (D-F) shows the differences between the intra-neurite diffusivity maps and the fourth row (J-L) shows the differences between the intra-neurite signal fraction maps.

Figure 4 :
Figure 4: Neurite orientation distribution functions overlaid on a map of intra-neurite signal fraction generated by the spherical convolutional neural network trained with randomly rotating the training data.The colour represents the principal direction, and the size is scaled according to neurite density.This coronal slice shows the intersection of the corticospinal tract and the corpus callosum.

Figure 5 :
Figure 5: Axial slices of the intra-neurite diffusivity (d i ), spherical compartment diffusivity (d sph ), intraneurite signal fraction (f i ), and spherical compartment signal fraction (f sph ) maps generated by the spherical convolutional neural network trained with randomly rotating the training data.
haustively as possible during training, the MLP training requirements are even more demanding since its rotational variance can only be reduced through learning.Changes in b-values or the angular resolution of shells will necessitate retraining our network.Technically, the same network could be used as long as the b-values remain consistent, but the spherical harmonics expansion would vary with different angular resolutions (i.e., the number of b-vectors).

Table 2 :
Average standard deviation of the estimated two-compartment model parameters over rotations of the input signals.The asterisk (*) refers to models trained with randomly rotated training data.The lowest values are highlighted in bold.