Frontal Cortex Segmentation of Brain PET Imaging Using Deep Neural Networks

18F-FDG positron emission tomography (PET) imaging of brain glucose use and amyloid accumulation is a research criteria for Alzheimer's disease (AD) diagnosis. Several PET studies have shown widespread metabolic deficits in the frontal cortex for AD patients. Therefore, studying frontal cortex changes is of great importance for AD research. This paper aims to segment frontal cortex from brain PET imaging using deep neural networks. The learning framework called Frontal cortex Segmentation model of brain PET imaging (FSPET) is proposed to tackle this problem. It combines the anatomical prior to frontal cortex into the segmentation model, which is based on conditional generative adversarial network and convolutional auto-encoder. The FSPET method is evaluated on a dataset of 30 brain PET imaging with ground truth annotated by a radiologist. Results that outperform other baselines demonstrate the effectiveness of the FSPET framework.


INTRODUCTION
Alzheimer's disease (AD) is a progressive disease that destroys memory and other important mental functions. As of 2019, it ranked as the sixth leading cause of death in China (Vos et al., 2020). There are more than 10 million patients with AD in China, a country with the most AD patients in the world (Jia et al., 2020).
AD is usually diagnosed based on the clinical manifestation. Nowadays, medical imaging including computed tomography (CT) or magnetic resonance imaging (MRI), and with singlephoton emission computed tomography (SPECT) or positron emission tomography (PET), can be used to help doctors understand the pathophysiology of AD, for example, Aβ plaques, neurofibrillary tangles, and neuroinflammation. Moreover, the pathophysiology of AD is believed that starts years ahead of the of clinical observation, and helps detect AD earlier than conventional diagnostic tools (Marcus et al., 2014).
Among the above medical imaging technique, PET/CT is a nuclear medicine technique that combines a PET scanner and a CT scanner to acquire sequential images from both devices in the same session, which are combined into a single superposed image. Figure 1 shows the brain PET/CT fusion image. The first line is the PET imaging, and the second line is the CT imaging. The fusion imaging of PET/CT is list in the third line. Each line from left to right is (a) coronal section, (b) median sagittal section, and (c) transverse section. 18 F-FDG PET imaging of brain glucose use and amyloid accumulation is a research criteria for AD diagnosis (Berti et al., 2011). Several 18 F-FDG PET studies have been conducted to estimate AD-related brain changes. They have consistently shown widespread metabolic deficits in the neocortical association areas, such as frontal cortex. Further studies have demonstrated that CMRglc in frontal cortex suffers an average decline of 16 − 19% over a 3-year period (Smith et al., 1992;Mielke et al., 1994). Frontal cortex covers frontal lobe and contains most of the dopamine neurons. Figure 2 shows the location of frontal cortex in the brain. The yellow part of the left subfigure is its anatomical location, while the red contour in the right subfigure indicates its location in 18 F-FDG PET imaging. Due to its sensitive to detect frontal cortex changes over time, 18 F-FDG PET imaging can be used not only for AD diagnosis but also to monitor dementia progression and therapeutic interventions. Therefore, PET imaging are valuable in the assessment of patients with AD. Moreover, the frontal cortex segmentation of PET imaging is crucial for understanding AD progression on AD-related regions in brain.
Although frontal cortex segmentation is an important problem for AD research. However as far as we know, this paper is the first work that studies the frontal cortex segmentation problem for PET imaging. Unlike organ or tumor, which is different from other tissue with gray-level, texture, gradients, edges, shape, etc., frontal cortex is a part of brain without obvious boundaries. Moreover, supervised learning frameworks need segmentation ground truth from professional doctor, and it is difficult to get large number of FIGURE 2 | The frontal cortex in the brain: the left is anatomical location, and the right is for 18 F-FDG positron emission tomography (PET) imaging. annotated imaging. All these makes frontal cortex segmentation a tough problem.
Since manual segmentation is time consuming, automatic semantic segmentation for medical images, which makes pathological structures changes clear in images, becomes one of the hottest research topic in image processing. Currently, more and more machine learning technologies have been used in medical applications, such as medical single processing, medical image processing, medical data analyzing, and so on (Jiang et al., 2021a,b;Yang et al., 2021). Brain and brain tumor segmentation is one of the most popular medical image segmentation tasks (Szilagyi et al., 2003;Tu and Bai, 2009;Zhang et al., 2015;Jiang et al., 2019). Many approaches have been proposed to address this problem, such as thresholding (Sujji et al., 2013), edge detection (Tang et al., 2000), Markov random fields (MRF) (Held et al., 1997), and support vector machine (SVM) (Akselrod-Ballin et al., 2006).
Due to the rapid development of deep learning, neural networks, which can extract hierarchical feature of images, become one of the most effective technique in brain imaging segmentation (Fakhry et al., 2016;Işın et al., 2016;Zhao et al., 2018). U-net (Ronneberger et al., 2015) and its 3D version V-Net (Milletari et al., 2016) are the most well-known deep learning architecture in medical image segmentation. Recently, organ and tissue's shape and position priors are combined into the segmentation algorithm to improve the accuracy. (Oktay et al., 2017) proposes a training framework ACNN, which incorporates cardiac anatomical prior into CNN. Boutillon et al. (2019) combines scapula bone anatomical prior into a conditional adversarial learning method.
The related work has made large progress in semantic segmentation in medical imaging. However, they are not designed for frontal cortex segmentation in PET imaging, and they cannot be utilized directly for this problem. Motivated by this, in this paper, we propose the supervised segmentation framework: Frontal cortex Segmentation model of brain PET imaging (FSPET). The FSPET model based on both conditional generative adversarial network (cGAN) and convolutional auto-encoder (CAE) incorporates the anatomical prior to improve the prediction accuracy.
The contribution of FSPET dedicated to frontal cortex segmentation is threefold. First, the CAE is used to find the embedding of frontal cortex shape priors in latent space. Second, the segmentation method based on U-net, as the generator of cGAN, learns the feature of frontal cortex to generate the binary mask in PET imaging. Third, the anatomical prior is fused into the discriminator model in cGAN to get more precise prediction. Extensive experiments demonstrate the effectiveness of the proposed FSPET model.

METHODS
In this section, we will introduce the proposed FSPET framework in detail, which combines the prior of frontal cortex shape in the deep neural networks, as shown in Figure 3. FSPET contains two parts: cGAN and CAE.

Conditional Generative Adversarial Networks
Generative Adversarial Network (GAN) (Goodfellow et al., 2014) is widely used for data augmentation by generating new images. Since PET imaging shows low contrast, low resolution, and blurred boundaries between different tissues, GAN is becoming a popular method for medical image segmentation (Luc et al., 2016;Son et al., 2017;Souly et al., 2017).
cGAN (Mirza and Osindero, 2014) is an extension of GAN, which is used as a machine learning framework for training generative models. The proposed FSPET model adopts the framework in Conze et al. (2021) based on cGAN, which consists of two neural networks: the generator G and the discriminator D.
The generator G of cGAN in the FSPET model is the segmentation framework, which learns the feature of frontal cortex to generate the binary mask in PET imaging. Formally, let x be the source image and y be the ground truth image of class label y i ∈ L = {1, 2, . . . , c}. The generator learns the mapping between images and labels G : x → L by optimizing the loss function using stochastic gradient descent. The generator of cGAN is often based on U-net framework. The network consists of a contracting path and an expansive path (shown in Figure 4B). The contracting path is a convolutional network that consists of repeated application of 3 × 3 convolutions, each followed by a rectified linear unit (ReLU) and a 2×2 max pooling operation. Dice loss is used in U-net to compare the prediction G(x) and ground truth y, in which the loss function is as follows: The discriminator D in FSPET (shown in Figure 4C) inputs are source images and prediction to be evaluated. D distinguishes the given boundary by the generator from the realistic segmentation. The output is a binary prediction as to whether the image is real (class = 1) or fake (class = 0). In cGAN, binary cross entropy (BCE) loss is used to determine the loss function:

Convolutional Auto-Encoder
Auto-encoder is a type of neural networks used to learn a representation (encoding) for a set of data. It imposes a bottleneck in the network, which forces a compressed knowledge representation of the original input. An auto-encoder consists of two parts, the encoder and the decoder, which can be defined as f and g such that: where X is the input and h is usually referred to as code, the latent representation of the input. Motivated by Oktay et al. (2017), we utilized the CAE to find the embedding of frontal cortex shape priors in latent space (shown in Figure 4A). The BCE loss is minimized in the CAE framework with the ground truth y as input: As shown in Figure 4A, after CAE is fixed, we use its encoder part f for segmentation training. By conducting CAE lowdimensional projection on both prediction and ground truth, we can minimize the loss function as:

Fusion
We have obtained three loss functions from different parts of the FSPET model respectively, whose information are listed in Table 1. Finally, we fuse the U-net segmentation method with frontal cortex shape priors. In the backward propagation, the loss  (Ronneberger et al., 2015), ACNN (Oktay et al., 2017), cGAN-Unet (Singh et al., 2018), and the FSPET model. function of generator G is:

Model
where λ 1 and λ 2 are the weighting factor. Minimizing L u tends to provide rough frontal cortex shape prediction, while maximizing log(D(x, G(x))) is designed to improve contour delineations. At the same time, the latent loss L e guarantees the global consistent and precise prediction similar to the original segmentation. Additionally, the loss function of discriminator D is: It maximizes log(D(x, y), which is the loss between input and ground truth. Simultaneously, it minimizes loss value for generated −log(1 − D(x, G(x))) masks. The optimization proceeds in alternative periods on G and D using stochastic gradient descent.

EXPERIMENTS
In this section, we conduct extensive experiments to validate the effectiveness of FSPET.

Validation Setup
Dataset: We collected 30 18 F-FDG PET images from different patients and their sensitive information was erased. The brain PET images were acquired using a PET/CT (Discovery STE, General Electric, Waukesha, USA) approximately 1 h after an intravenous injection of 18 F-FDG (10 mCi). The original data were stored in Digital Imaging and Communications in Medicine (DICOM) format. Images were then resampled with a resolution 512 × 512 pixels. Frontal cortex in all images were annotated by a radiologist with 7 years experience to obtain the ground truth and also the shape priors. Baselines: We compare FSPET with different baseline methods in frontal cortex segmentation for brain PET imaging. The comparison methods used in the experiments include: • U-net (Ronneberger et al., 2015): U-net is the classical segmentation algorithm for medical images, and the generator G of FSPET is based on U-net. The architecture (shown in Figure 4B) contains contraction path and symmetric expanding path. The former path is used to capture the context in the image and the latter one is used to enable precise localization using transposed convolutions. • ACNN (Oktay et al., 2017): It utilizes the auto-encoder and T-L network to combine anatomical prior knowledge into CNNs. These regularizers make predictions that are in agreement with the shape priors. • cGAN-Unet (Singh et al., 2018): It is proposed for breast mass segmentation in mammography. The cGAN is used in the segmentation framework, in which the generative network learns the features of tumors and the adversarial network guarantees the contour to be similar to the ground truth.
Measurements: With the definition of true positive (TP), true negative (TN), false positive (FP), and false negative (FN), the following metrics are used to provide an overall assessment of all methods: • Dice coefficient (dice): It is a similarity measure over prediction and ground truth. It ranges between 0 and 1. where h(A, B) = max a∈A min b∈B ||a − b|| Training Detail: In this paper, we use Adam optimization (Kingma and Ba, 2014), which is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. To train the FSPET model, the CAE with BCE loss is first optimized based on (3). With the learning rate of 0.01 for 30 epochs, the batch size is fixed at 32. Then the optimization of cGAN proceeds in alternative periods on G and D according to (5) and (6). In (5), the weight λ 1 = 10 −2 and λ 2 = 10 −4 . With the learning rate of 10 −4 , batch size 32 and 30 epochs enjoyed the best performance.

Results
Quantitative metric and score values are provided in Table 2 for frontal cortex segmentation. When comparing U-net and ACNN, the dice score improves from 71.03% to 74.57%, and HD score decreases from 38.73 to 35.48. This demonstrates that extending U-net with CAE allows the model taking advantage of latent representation of shape priors. Moreover, significant improvements can be noticed using cGAN-Unet comparing with U-net (38.73 to 30.32 on HD), which indicates the appropriateness of embedding U-net into a cGAN pipeline. Combining CAE and cGAN networks, the proposed FSPET model discriminates more efficiently frontal cortex from surrounding structures by achieving the best score with regard to dice, Jaccard index, sensitivity, and HD. In particular, large gains in terms of Jaccard index (55.04-71.47%) and ) are reported between U-net and FSPET. Qualitative results for frontal cortex segmentation in median sagittal section of brain PET imaging are displayed in Figure 5. Compared to U-net, ACNN, and cGAN-Unet, which are prone to under-or over-segmentation, sometimes combined with unrealistic shapes, better contour adherence and shape consistency are reached by the FSPET model. We also take one example (shown in bottom right of each subfigure) for comparison, and the FSPET captures more complex shape and subtle contours compared to other frameworks. This reveals the importance of combining both adversarial networks and shape priors in the segmentation task.

CONCLUSION
This paper propose a deep learning framework to segment frontal cortex from brain PET imaging. The model based on both cGAN and CAE incorporates the anatomical prior to improve the prediction accuracy. Future work will utilize the proposed method to detect other parts of AD-related brain area, such as hippocampus.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Jiangnan University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
QZ contributed to the conception of the study and wrote the manuscript. YuanyuanL performed the experiment. YuanL helped to perform the analysis with constructive discussions. WH contributed significantly to data preparation.