A Gaussian Mixture Model Approach for Estimating and Comparing the Shapes of Distributions of Neuroimaging Data: Diffusion-Measured Aging Effects in Brain White Matter

Neuroimaging signal intensity measures underlying physiology at each voxel unit. The brain-wide distribution of signal intensities may be used to assess gross brain abnormality. To compare distributions of brain image data between groups, t-tests are widely applied. This approach, however, only compares group means and fails to consider the shapes of the distributions. We propose a simple approach for estimating both subject- and group-level density functions based on the framework of Gaussian mixture modeling, with mixture probabilities that are testable between groups. We demonstrate this approach by application to the analysis of fractional anisotropy image data for assessment of aging effects in white matter.


INTRODUCTION
Fractional anisotropy (FA), a scalar measure derived from diffusion tensor imaging (DTI), indexes the degree of anisotropy of water diffusion in brain tissue. In normal white matter (WM), water diffusion is highly constrained to predominantly move parallel to the long axis of axon bundles, or nerve fibers. High FA is thus characteristic of normal WM and may be an indicator of its health. Low FA, on the other hand, can reflect loss of WM microstructural elements in tissues, which normally have high FA and may be an indicator of disease. Low WM FA is associated with demyelinating disease, dementia, traumatic brain injury (TBI), and normal aging.
Inter-subject variability of WM microstructure has been reported with normal subjects as well as patients with various neurodegenerative diseases (1)(2)(3)(4)(5)(6). Neurodegeneration is also a feature of normal aging and WM effects of aging disrupt cerebral connectivity, leading to cognitive dysfunction (7,8). Commonly applied statistical analyses, e.g., t -tests at each voxel, may be inherently insensitive to disease pathology due to inter-subject spatial variation (2)(3)(4)9). A whole brain (or whole WM) histogram approach has been used in some studies to address this limitation (10,11). For example, Benson et al. (10) estimated kurtosis, skewness, peak height, and mean from histograms of WM FA in TBI patients and normals. They aggregated these measures to test for group differences in the shapes of the individuals' histograms. This analysis demonstrated that the FA distribution of TBI patients exhibited higher kurtosis, higher peak height, and greater skew toward higher FA, but lower mean, compared to those derived from controls. Although this approach used standard statistical summary measures, differences among the shapes of distributions between groups are not easily understood using these summary measures. In addition, these summary measures are not proper for description of multimodal distributions, a special case, which could result from a mixture of multiple distributions (12). Therefore, we propose that estimation and comparison of density functions between groups based on a mixture distribution approach will be more relevant to brain imaging data, which can exhibit unusual distributions.
In this study, we propose a simple approach to estimate subjectlevel density functions. The technique is based on a Gaussian mixture model (GMM), which assigns subject-specific mixing probabilities to latent underlying Gaussian densities a posteriori to characterize an overall distribution of each subject. Estimation of group-level density functions will be based on the estimated subject-level density functions. Differences between groups therefore only depend on the composition of mixture probabilities to underlying Gaussian densities, which lead to an easy and intuitive comparison between groups. For instance, in a simple GMM assuming two Gaussian density components with equal variance constraint, a mixture density function with higher mixing probability to the Gaussian components with lower mean is characterized as a distribution with lower mean and positive skew, while www.frontiersin.org the opposite is characterized as a distribution with higher mean and negative skew. Additionally, the mixture density function with mixing probabilities (1/2, 1/2) is a symmetric distribution with the highest variance in the data. We apply the proposed method to normal control subjects, to examine the effects of aging on the brain-wide distribution of FA.

GENERAL GMM
A general form of a GMM with m components can be expressed as follows: where Z ij is FA measurement of the j-th voxel (j = 1, . . ., N ) observed from the i-th subject (i = 1, . . ., n). The density function f(.) is assumed to be a convex combination of m latent is the standard normal density function, and θ = (µ k, σ k , and τ k , k = 1, . . ., m). The likelihood function based on model (1) is expressed as follows: We assume that the variances are the same across the m-Gaussian densities, i.e., σ 1 = . . .σ m = σ, which reduces φ k (k = 1, in Eq. 1. The m latent Gaussian densities then differ only by their centers. We order the centers of the m-Gaussian densities as µ 1 < . . . < µ m . This parameterization gives easy interpretation of results for comparison of density functions across subgroups. For example, a density function with higher mixing probabilities for low order Gaussian densities will have a smaller mean than that with higher mixing probabilities for higher order Gaussian densities; a density with mixing probability of 1/2 to each of the two Gaussian densities (lowest, highest) will have the largest variance than any other combination of mixture probabilities.
To estimate the parameters θ of the general GMM with equal variance constraint, we applied an expectation maximization (EM) algorithm (13); of note, a Bayesian mixture modeling approach (14,15) is an alternative approach. Specifically, the EM algorithm treats the mixture model in Eq. (1) as an incomplete likelihood function with missing membership information for each observation. In each iteration of the EM algorithm, expectation-step (E-step) computes expectation of complete specification of loglikelihood function with respect to membership values with given data and estimated parameters, and maximization-step (M-step) computes maximum likelihood estimates of parameters specified in the likelihood function with the expected membership values (16). The applied EM algorithm is provided in Table 1. The total number of parameters with equal variance condition for m-Gaussian densities is 2m [=m (means) + m − 1 (mixing probabilities) + 1 (variance)]. To determine the optimal number Probability density function A variable noted with (t) is the estimated value for the variable at the t-th iteration.
of the latent Gaussian densities, φ k 's (k = 1, . . ., m), we used the Akaike information criterion (AIC), which is expressed as . So that the optimal number of Gaussian densities (m) is associated with a minimum AIC value.

ESTIMATION OF SUBJECT-AND GROUP-LEVEL DENSITY FUNCTIONS
In this section, we propose a method to estimate subject-and group-level density functions and their associated mixing probabilities. This approach fits the general GMM of Eq. 1 to the full dataset (Z ij , i = 1, . . ., n; j = 1, . . ., N ), all voxels from all subjects, and estimates subject-and group-level density functions a posteriori.
Under the general GMM, the membership probability of z ij to k-th Gaussian density, denoted by τ ij (k), is obtained based on Bayes theorem as follows: which results in m k=1 τ ij (k) = 1.
The estimated subject-level density function f i for the i-th subject can be expressed as follows: with φ k (z ij ) (k = 1, . . ., m) estimated from the general GMM Eq.
(1) using all voxel data from all subjects. The parameter τ i (k) in Eq. 3 is the mixing probability of the k-th Gaussian density for the i-th subject, which we estimate as an average of membership probabilities of Z ij (j = 1, . . ., N ) as follows:

Frontiers in Public Health | Epidemiology
where τ ij (k) was estimated based on Eq. 2. Similarly, the estimated group-level density function f g * for the g -th group can be expressed as follows: where τ g * (k) in Eq. 4 is the mixing probability of the k-th Gaussian density for the g -th group, which we estimate as an average of τ i (k) of the subjects in the g -th group as follows: where n g is the number of subjects in the g -th group.

EXAMPLE SUBJECTS
The Albert Einstein College of Medicine Institutional Review Board (IRB) approved and monitored this study. Twenty-eight normal subjects without history of head injury, cardiovascular or cerebrovascular disease, diabetes, or neurological or psychiatric disease were recruited between August 2006 and May 2010 through advertisements. Demographic data for the 28 normal subjects are summarized in Table 2.

DEMONSTRATION OF THE PROPOSED METHODS USING EXAMPLE FA IMAGE DATA
Fractional anisotropy datasets were normalized prior to fitting the proposed models. The normalization was necessary because it is well-known that FA is heterogeneous across brain regions; for example, higher FA is characteristic of deep WM such as the corpus callosum, and lower FA is typical of peripheral WM. Specifically, we normalized each FA measurement x ij by z ij = with mean x j and standard deviation (s j ) of n subjects (i = 1, . . ., n) at each voxel (j = 1, . . ., N ).
Subjects were divided into four age groups: 20-29, 30-39, 40-49, and 50-59 years. Each age group consisted of seven subjects (four women and three men). Age groups were matched for years of education, differing by no more than 2 years; no significant difference in years of education was detected among age groups (p = 0.76). Two Gaussian densities were required for the approach based on the AIC. In Figure 1, estimated density functions across the entire control group (n = 28) and each of the four age groups are presented f (z) , f g * (z) , g = 1, 2, 3, 4 . Inference on the difference in shapes of FA distribution across age groups was performed by estimating mixing probabilities from the proposed estimation method; these results are shown in Table 3 and Figure 2. We order estimated Gaussian densities (φ k ) by their centers, in ascending manner; thus, we identify φ 1 as the Gaussian density with the lowest center. Mixing probabilities of the two Gaussian densities, k = 1 and k = 2, show opposite patterns of change across age groups. As age increases the mixing probability of the first Gaussian density (k = 1) increases while that of the second Gaussian density (k = 2) decreases. Box plots describing distributions of subjectlevel estimated mixing probabilities to the second Gaussian density [τ i (k = 2), i = 1, . . ., n] are provided for all age groups in Figure 2. While mixing probabilities to the second Gaussian density (k = 2) were lower for older age group in that lower mean FA for older age group, higher between-subject variance is noted in Figure 2. A Kruskal-Wallis test was performed to compare subject-level mixing probabilities between Group = 1 and Group = g (g = 2, 3, 4). A significant difference in the mixing probability was found between Groups 1 (20-29 years) and 4 (50-59 years) with p = 0.048 with degrees of freedom (1,12). This significantly lower mean mixing probability to the Gaussian density with a greater mean implies that FA declines significantly over the age of 50 years. This pattern also implies lower intra-subject variance in FA distribution for the age group (50-59 years) because of higher mixing probability to www.frontiersin.org the first Gaussian density (k = 1). While all age groups showed positively skewed FA distributions (Table 3), the youngest group, aged 20-29 years, showed the greatest skewness to the right. However, the Kruskal-Wallis test, which compares all four groups was not significant (p = 0.141, df = 3, 24).

DISCUSSION
The proposed approach for subgroup density estimation with GMM allows group comparison based on shape parameters represented by the relative mixing probabilities of the latent Gaussian densities. Since Gaussian densities are estimated based on the measurements from all voxels of all subjects, estimated individual densities will differ only by their own mixing probabilities. Individual densities are thus characterized by comparing the estimated mixing probabilities. Although kernel density estimation (KDE)  determining the shape of a density function by KDE is only the bandwidth for the chosen kernel function. In contrast, mixing probabilities estimated in the present study enable comparison of shapes between groups, which we have demonstrated with a real FA data set. The GMM approach proposed herein was designed to discriminate density functions across different subgroups. This approach implicitly assumes that the density function derived from all voxels will represent a mixture of at least two Gaussian components. Testing for differences in the shapes of the density functions from different groups is then possible by testing the means of mixing rates assigned to m-Gaussian densities (where m ≥ 2) between groups. However, if one Gaussian density function (i.e., m = 1) is sufficient to fit the entire distribution, discrimination of density functions between different subgroups is not available with the proposed approach. Additionally, the proposed GMM is a highly parsimonious approach in that it does not incorporate any subject-or group-specific shape parameters in the likelihood function; further improvement may be achieved by their inclusion.
A GMM with unequal variance assumption resulted in little difference in fitting performance compared with the proposed approach with equal variance assumption (data not shown). Nevertheless, application of non-Gaussian mixtures could be employed, study of which is beyond the scope of the present paper and should serve as a future study.
Initial application of the method to human FA datasets revealed that the distribution of FA differs significantly between subjects aged 20-29 years and those aged 50-59 years. Age-related change in FA distribution was found in mean, variance (intra-subject, between-subject), and skew. Since aging is a feature of neurodegeneration, this finding may have an implication to other neurodegenerative diseases, e.g., dementia and Alzheimer's disease. Since the present demonstration is based on a small sample, further examinations with larger samples are warranted to fully characterize the utility of this approach and the age effects it has revealed. www.frontiersin.org