A Heavy Tailed Expectation Maximization Hidden Markov Random Field Model with Applications to Segmentation of MRI

A wide range of segmentation approaches assumes that intensity histograms extracted from magnetic resonance images (MRI) have a distribution for each brain tissue that can be modeled by a Gaussian distribution or a mixture of them. Nevertheless, intensity histograms of White Matter and Gray Matter are not symmetric and they exhibit heavy tails. In this work, we present a hidden Markov random field model with expectation maximization (EM-HMRF) modeling the components using the α-stable distribution. The proposed model is a generalization of the widely used EM-HMRF algorithm with Gaussian distributions. We test the α-stable EM-HMRF model in synthetic data and brain MRI data. The proposed methodology presents two main advantages: Firstly, it is more robust to outliers. Secondly, we obtain similar results than using Gaussian when the Gaussian assumption holds. This approach is able to model the spatial dependence between neighboring voxels in tomographic brain MRI.


INTRODUCTION
The segmentation of brain MRI consist in the parcellation of the brain areas into their main tissue components: white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Segmentation is an important preprocessing step in many brain image applications and it is currently an active field of research (Ortiz et al., 2012(Ortiz et al., , 2014Beare et al., 2016;Mulder et al., 2017;Serag et al., 2017).
A wide range of segmentation approaches assumes that the distribution of the histogram of intensity for each brain tissue can be modeled using a Gaussian distribution or a mixture of Gaussians (Laidlaw et al., 1998;Ruan et al., 2000). In Leemput et al. (1999) and Zhang et al. (2001), apart from the Gaussian assumption, spatial information is included in the mathematical model considering a hidden Markov random field.
Additional works using a Gaussian mixture model for describing the brain tissue histograms are Ashburner and Friston (2005), Greenspan et al. (2006), Ashburner and Friston (2007), and Merisaari et al. (2009). da Silva (2009) presents a Markov chain sampling technique for exploring normal mixture models when the numbers of components are unknown. In that work, a Gaussian mixture model with more than three components is used to explain the three brain tissues: GM, WM, and CSF.
Gaussian assumption and the hidden Markov random field model are currently being used as a segmentation procedure for magnetic resonance images (Xia et al., 2016;Liang et al., 2016;Nguyen et al., 2017;Wang et al., 2017). Nevertheless, this assumption presents some drawbacks that were pointed out in Salas-Gonzalez et al. (2013b). Assigning an unique Gaussian distribution to each brain tissue histogram is not accurate, as the histogram of white and gray matter exhibits heavy tails and certain degree of asymmetry. This problem is usually circumvented assigning more than one Gaussian distribution to each histogram, but, even using this approach, this procedure has also two additional drawbacks: Firstly, it is hard to identify the the boundary between the GM and WM histograms, specially when two small weighted Gaussian distributions are competing to explain data in that area. Secondly, a mixture of Gaussians does not exhibit heavy tails. In this work, we show that αstable distributions are suitable to model the brain tissues more accurately.
A random variable with α-stable distribution fulfill the following property: a linear combination of two independent copies of the variable has the same distribution. Furthermore, the stable distributions are completely described by only four parameters [the characteristic exponent (α), degree of asymmetry (β), dispersion (γ ) and location (δ)]. The Gaussian distribution is a particular case of α-stable distribution and, along with the Cauchy and Lévy distributions, they are the only α-stables probability density functions which can be written in closed form Samorodnitsky and Taqqu (1994). This is the reason why α-stable probability density functions (pdf) are evaluated numerically. These properties confer to the stable distribution the ability to fit asymmetric and heavy tailed data better than the Normal distribution (Salas-Gonzalez et al., 2009, 2010. The α-stable distribution has been historically used in many fields of research (Physics, Economics or Engineering among others), but only recently it has been used for applications in brain image processing (Salas-Gonzalez et al., 2013a, 2015. In addition, the α-stable finite mixture model has been also used in the parametrization of GM and WM in MRI (Salas-Gonzalez et al., 2013b). In that work, it was stated that the α-stable distribution is able to fit the brain image tissues more accurately than the Gaussian distribution. Nevertheless, the finite mixture model performs an analysis of the histogram of intensity values, and it does not include spatial information in the model. Thus, that approach can only be used as a thresholding segmentation method.
In this article, we extend the work published in Salas-Gonzalez et al. (2013b). Specifically, we present a hidden Markov random field model with expectation maximization modeling the components using the α-stable distribution. As the Gaussian model is a particular limiting case of the α-stable distribution, the proposed model is a generalization of the widely used EM-HMRF assuming Gaussian distributions in the histogram of each component. The proposed methodology is tested in two datasets: synthetic data and a real problem of MRI segmentation of GM and WM.
This work is organized as follows: section 2 presents the hidden Markov random field algorithm proposed in this article along with the T1-weighted MRI database. The results are given and discussed in sections 3 and 4. Finally, the conclusions are drawn in section 5.

Hidden Markov Random Field with α-Stable Distributions
The goal of the mathematical model is to assign each voxel in the image to one of the 2 components of the α-stable mixture model using a maximum a posteriori criterion. A similar model was presented in Wang (2012) assuming Gaussian distributions: (1) y is the 3D MRI image and x is a 3D matrix with same size as y containing information about the voxel assignment or allocation ( x i = 1 for GM and x i = 2 for WM). In addition, P(y|x) is the likelihood probability of the observation and P(x) is the prior probability of the class (GM or WM). P(y|x) is estimated assuming an α-stable distribution for the histogram of intensity values. As instance, let consider that a voxel x i is allocated to GM (assigning x i = 1) or WM (x 2 = 2). Thus, the α-stable parameters of the GM histogram, can be calculated by selecting all the voxels that fulfill the condition x i = 1 and performing likelihood estimation of the α-stable parameters (Nolan, 1997), obtaining θ 1 = {α 1 , β 1 , γ 1 , δ 1 }. The same procedure is used for the estimation of the α-stable parameters of the WM histogram, which we denote as θ 2 = {α 2 , β 2 , γ 2 , δ 2 }. The likelihood probability of the observation P(y|x) can be estimated straightforwardly. As instance, for a given intensity value y i from the MRI brain image y, the likelihood is P(y i |θ 1 ) and P(y i |θ 2 ). P( | ) denotes the α-stable probability density function which is estimated numerically.
Under a Markov random field framework, these two probabilities are derived from: and where Z is a normalization constant (the partition function) and U denotes the energy function.
In this framework, we introduce spatial information in the model by means of the prior energy function U(X): where V c (X) is the clique potential and C is the set of all possible cliques (the 6 nearest voxels in the 3D space). The clique potential is defined on pairs of neighbors as: with β C is a coefficient which models the contribution of the prior energy U(X) in the posterior energy term. We choose β C = 0.5 as we obtain a good performance for the applications studied in this work using this value. Increasing this β C parameter we increase the contribution of the neighboring voxels in the estimation of the class of a given voxel. In addition, the contribution of the pdf used for modeling the histogram data decreases when we increase the β C parameter. Therefore, we would obtain the same accuracy values in the segmentation of images regardless of the chosen pdf, as only U(X) would contribute in the calculation of the posterior energy U(X|Y). On the other hand, in the limiting case when β C = 0 the HMRF model transforms in the finite mixture model studied in Salas-Gonzalez et al. (2013b). We set V C = 0 if a neighbor voxel belongs to a different tissue type and V C = 1 if the class type of the voxels x i and x j are the same. This summation is taken over all 6 couples of voxels that are nearest neighbors in the image grid.
The joint likelihood of the model is where f α i ,β i (y i |γ i , µ i ) denotes an α-stable pdf which can be evaluated numerically. Parameter values depend on the allocation of variable x i . Therefore, we could also write: In addition, as Z is a constant normalization term, the log-likelihood of the model is related to the likelihood energy log P(x|y) ∝ −U(x|y), and, therefore, the maximum a posteriori estimation of the parameters is equivalent to the minimization of the posterior energy function.

Expectation-Maximization Algorithm
The expectation-maximization approach is an iterative procedure for obtaining the unknown parameters of the model. This approach is performed in two differentiate steps. Firstly, we initialize the parameters of the data θ (0) , and then:

1) Expectation
Let θ (t) be the current parameter values. We calculate the likelihood distribution for each of all possible labels (in this work j = 1, 2).
In addition, we estimate the weights of the model w j as the proportion of the voxels allocated to component j.

2) Maximization
We maximize the conditional expectation in order to update the parameter values: Using the current values of the parameters, we calculate the allocation of variablesx via a maximum a posteriori estimation.

MRI Database
The set of 18 images used in this article were obtained from the Internet Brain Segmentation Repository 1 (IBSR) which provides manually-guided expert segmentation results along with magnetic resonance brain image data.
The MRI image acquisition, according to the information provided by the Internet Brain Segmentation Repository documentation, follows this procedure: The coronal threedimensional T1-weighted spoiled gradient echo MRI scans were performed on two different imaging systems. Ten FLASH scans on four males and six females were performed on a 1.5 tesla Siemens Magnetom MR System (Iselin, NJ) with the following parameters: TR = 40 ms, TE = 8 ms, flip angle = 50 • , field of view = 30 cm, slice thickness = contiguous 3.1 mm, matrix = 256 × 256, and averages = 1. Ten 3D-CAPRY scans on six males and four females were performed on a 1.5 tesla General Electric Signa MR System (Milwaukee, WI), with the following parameters: TR = 50 ms, TE = 9 ms, flip angle = 50 • , field of view = 24 cm, slice thickness = contiguous 3.0 mm, matrix = 256 × 256, and averages = 1.
Initially, we selected only the WM and GM voxels in the images. We studied the improvement of the hidden Markov random field model with α-stable distributions over the HMRF assuming Gaussian which is usually considered in the literature. Specifically, the goal of this study is to evaluate how this distribution assumption affects the segmentation in the boundary between GM and WM.
The HMRF model used for segmentation purposes requires the existence of a segmented image for the initialization of the algorithm. This initial segmented image is updated iteratively by the EM and the maximum a posteriori estimation steps using the hidden Markov random field algorithm presented in section 2.1. In this work, the initial segmented brain image x is built from the dataset using a leave-one-out procedure. As instance, for the image number i, we select the manually segmented Frontiers in Neuroinformatics | www.frontiersin.org images available in the IBSR dataset for all the images but the current image i. Then, we perform the affine registration of these segmented images to the current source image i. Finally, using the remaining 17 brain images, we build a tissue probability map in the spatial space of the image i. It is considered that a voxel initially belongs to a given tissue if allocation probability to this tissue is the maximum.

Synthetic Data
We test the hidden Markov random field model to perform the segmentation of a synthetic 3D image with impulsive α-stable noise. This image is built as an empty sphere centered in a threedimensional box with dimension 20 × 20 × 20 = 8,000 voxels. The background and inner sphere are built by a stable random distribution with parameters θ 1 = {α 1 , β 1 , γ 1 , δ 1 } = 1.41100. The rest of the image is built using a random variable with θ 2 = {1.801020}. The transaxial slices showing the preliminar position of the first and second component regions are plotted in Figure 1.
In addition, Figure 2 shows a montage with all the slices of the source image, but, in that case the synthetic 3Dimage includes α-stable random noise. This synthetic data is very impulsive, as instance, their values range from −180 to 3,480. For the sake of clarity, we choose a grayscale colormap with −50 and 50 for white and black, respectively. Although there are 10 voxels with intensity values lower than −50 and 269 with intensity values greater than 50.
We have performed the segmentation of this synthetic image in Figure 2 using the EM-HMRF model with α-stable distributions. Figure 3 shows the segmentation results after 20 iterations. The sum of the Log-Likelihood in each iteration when the α-stable model is used is depicted in Figure 4. As we can notice, convergency is reached after 7 iterations. The histogram of the voxel intensity values is also depicted in Figure 5 where the predicted α-stable density is plotted. The components of the mixture are very mixed and the overall histogram of intensity values seems clearly unimodal. Despite of that fact, predicted densities fit very accurately the truth distributions of the two regions .
We have performed the same analysis and segmentation procedure but considering a EM-HMRF with Gaussian distributions. Figure 6 shows the results when the Gaussian model is considered. In that case, Gaussian assumption does not lead to satisfactory results due to the presence of outliers in the model. Specifically, one component tries to fit the bulk of the distribution modeling most of the voxels in the image, while the second component, which has a very large variance, tries to model the outliers in the image. In addition, we obtain 96% of accuracy when the α-stable EM-HMRF is used and 85% with the Gaussian. Mainly, the Gaussian model fails in the segmentation of the inner part of region 1. This behavior is highlighted in  One of the main advantages of the α-stable distribution upon the Gaussian is that the later is a particular case of stable distribution. Therefore, the approach presented in this paper can be viewed as a generalization of the EM-HMRF with Gaussian distributions. In order to test the flexibility of this proposed model and its ability to work in Gaussian noise environments, we have tested the performance of our method by segmenting synthetic data with additive Gaussian noise. For that reason, we repeat the previous analysis but using a synthetic 3D image with Gaussian noise. Again, the image is an empty 3D sphere centered   in a box with dimension 20 3 = 8,000 voxels. The outer and inner region is built by a Gaussian random variable with µ 1 = 0 and σ 1 = 14.14 while the region number two is randomly sampled from a Gaussian with values µ 2 = 20 and σ 2 = 14.14. Figure 8 shows a montage with all the slices of the source image. The distribution of this synthetic image is formed by a mixture of two Gaussian components and their values range from −49.4 to 67.1.
We obtained the same accuracy in the segmentation results (95.4% of voxels correctly identified) using the α-stable or Gaussian model. Specifically, we got the following parameter values for the distribution of the first and second region: θ 1 = {2, −9.62, −0.15} and θ 2 = {2, −9.60, 20.8}. As a result of this experiment, we have proved that Gaussian parameter values were estimated accurately, as the α-stable dispersion parameter and the Gaussian standard deviation σ are related with the formula σ = √ 2γ . The β parameter is undefined when α = 2.
Histogram of voxel intensity values and the predicted density are plotted in Figure 9. Again, as in the first example, the two original components are quite mixed.

MRI Data
Recently, we have published a study of the histogram of GM and WM intensity values (Salas-Gonzalez et al., 2013b).
There, we restricted our study to the global analysis of histograms and we did not include any spatial information in the model. In this work, we show how the HMRF can be used to enhance this previous work. Thus, the proposed α-stable methodology is used to include spatial information in the model by means of the neighboring relations between voxels. This information can be used to perform the segmentation of gray matter and white matter tissues in brain magnetic resonance images. We test the EM-HMRF α-stable algorithm in a dataset of 18 magnetic resonance images from the Internet Brain Segmentation Repository and compare the proposed methodology with respect to the Gaussian assumption. Table 1 shows the accuracy segmentation results obtained using the EM-HMRF α-stable model. We compare these results with the assumption of a Gaussian distribution for the GM and WM histograms (Zhang et al., 2001). Figure 10 shows a box plot with the accuracy results and the Dice similarity coefficient in Table 1. As usual, the central mark is the median, and the edges of the box are the 25th and 75th percentiles. The bars extend to the most extreme data points. This plot shows that the HMRF α-stable algorithm performs better than Gaussian version of the algorithm. Specifically, the following median accuracy values are obtained: 0.9141 (α-stable EM-HMRF) and 0.8936 (Gaussian EM-HMRF). Figure 11 shows the manual segmentation (Manual) considered the ground truth and the segmentation results (Stable) in a transaxial slice for each of the 18 brain images from the IBSR dataset. This figure allows us to visually inspect the performance of the heavy tailed HMRF segmentation method.

DISCUSSION
The method presented in this work, along with the previous work in parameterization of the brain tissues can be used to develop novel brain image processing methods under a heavy-tailed assumption. Let note that, in this paper, we do not focus on building a complete segmentation methods with several different preprocessing steps or parameters to calibrate. Instead of that, we study and isolate one common assumption which is made by many segmentation approaches in the literature, as instance: the Gaussian distribution can be used to fit the histogram of intensity values of the GM and WM tissue in MRI. We showed in this work that Some advantages of the α-stable EM-HMRF method can be summarized as follows: • It allows us to model each histogram of brain tissues using only one distribution. • It allows us to deal with heavy-tailed data.
• It allows to fit asymmetric distribution in a parsimonious way. • As the Gaussian is a limiting case of the α-stable distribution, we expect to obtain the same results when the Gaussian assumption is correct. This was found to be the case when we applied the α-stable EM-HMRF to synthetic data.

Future Work
• The α-stable HMRF approach opens a potential direction to develop novel applications in neuroimage in the future. As instante, the applications of the mathematical model presented here is not limited to the segmentation of magnetic resonance FIGURE 10 | Segmentation results: accuracy and Dice similarity coefficient.
Frontiers in Neuroinformatics | www.frontiersin.org images. Other image modalities with heavy-tailed distribution can also benefit from the algorithm presented here, as instance FP-CIT SPECT images used for Parkinson's disease diagnosis were found to present also heavy tails (Salas-Gonzalez et al., 2013a). • An additional step will be to extend the algorithm in order to include CSF in the model. Nevertheless, the histogram of their intensity values are not heavy-tailed, therefore, a different strategy should be envisaged for this tissue.

CONCLUSION
In this work, we have presented an expectation maximization hidden Markov random field algorithm for the segmentation of white and gray matter in magnetic resonance images. In order to achieve this goal, distributions of intensity values of GM and WM have been modeled using α-stable distributions. This method has been tested using synthetic and real brain MRI data. We have compared the proposed methodology to the modelization of these tissue distributions using Gaussians. We have obtained a better performance in terms of accuracy when the α-stable model is used.