Cerebral Micro-Bleeding Detection Based on Densely Connected Neural Network

Cerebral micro-bleedings (CMBs) are small chronic brain hemorrhages that have many side effects. For example, CMBs can result in long-term disability, neurologic dysfunction, cognitive impairment and side effects from other medications and treatment. Therefore, it is important and essential to detect CMBs timely and in an early stage for prompt treatment. In this research, because of the limited labeled samples, it is hard to train a classifier to achieve high accuracy. Therefore, we proposed employing Densely connected neural network (DenseNet) as the basic algorithm for transfer learning to detect CMBs. To generate the subsamples for training and test, we used a sliding window to cover the whole original images from left to right and from top to bottom. Based on the central pixel of the subsamples, we could decide the target value. Considering the data imbalance, the cost matrix was also employed. Then, based on the new model, we tested the classification accuracy, and it achieved 97.71%, which provided better performance than the state of art methods.


INTRODUCTION
Cerebral micro-bleeding (CMB) are small chronic brain hemorrhages that can be caused by structural abnormalities of the small vessels of the brain. According to the recent research reports, the causes of CMBs also can be some other common reasons, including high blood pressure, head trauma, aneurysm, blood vessel abnormalities, liver disease, blood or bleeding disorders and brain tumors (Martinez-Ramirez et al., 2014). It also can be caused by some unusual etiologies, such as cocaine abuse, posterior reversible encephalopathy, brain radiation therapy, intravascular lymphomatosis, thrombotic thrombocytopenic purpura, moyamoya disease, infective endocarditis, sickle cell anemia/β-thalassemia, proliferating angio-endotheliomatosis, cerebral autosomal dominant arteriopathy subcortical infarcts, leukoencephalopathy (CADASIL), genetic syndromes, or obstructive sleep apnea (Noorbakhsh-Sabet et al., 2017). The patients suffering from CMBs can have symptoms where the corresponding area that is controlled by the bleeding area malfunctions, resulting in a rise in intracranial pressure due to the large mass putting pressure on the brain and so on. CMBs could be easily ignored as the similar symptoms and signs of the subarachnoid hemorrhages, unless the patients have more obvious symptoms, such as a headache followed by vomiting. Those symptoms can eventually become worse or occur suddenly, based on the distribution and intensity of the CMBs. Patients suffering from CMBs can result in cognitive impairment, neurologic dysfunction and long-term disability. CMBs could also induce side effects from medication or treatments. The worse thing is that the death is possible and can happen quickly. Therefore, the early and prompt diagnosis of CMBs is essential and helpful in timely medical treatment.
Due to the paramagnetic susceptibility of the hemosiderin (Allen et al., 2000), CMBs can be visualized by T2 * -gradient recalled echo (GRE) imaging or susceptibility weighted imaging (SWI). Traditionally, CMBs are manually interpreted based on criteria including shapes, diameters and signal characteristics after imaging. However, the criteria were varied as reported in different studies (Cordonnier et al., 2007), until 2009 when Greenberg et al. (2009) published the consensus on standard criteria for CMB identification. However, manual detection methods involve the human interventions, which can bring biases. Meanwhile, the manual detection is labor intensive, hard to reproduce and difficult to exclude the mimics, which can lead to misdiagnosis.
Therefore, the development of automatic CMB detection is important and essential for the accurate detection and early diagnosis of CMBs. Due to the benefits of advanced imaging technologies, massive computer vision aided systems have been developed for automatic CMB detection. For example, Ateeq et al. (2018) proposed a system based on an ensemble classifier. Their system consisted of three steps: first the brain was extracted, then the initial candidates were detected based on the filter and threshold, and finally, feature extraction and classification model were built to remove the false alarms. Fazlollahi et al. (2015) proposed using a multi-scale Laplacian of Gaussian (msLoG) technique to detect the potential CMB candidates, followed by extracting a set of 3-dimensional Radon and Hessian based shape descriptors within each bounding box to train a cascade of binary random forests. Barnes et al. (2011) proposed a statistical thresholding algorithm to recognize the potential hypointensities. Then, a supervised classification model based on the support vector machine was employed to distinguish true CMBs from other marked hypo-intensities. van den Heuvel et al. (2016) proposed an automatic detection system for microbleeds in MRIs of patients with trauma based on twelve characteristics related with the dark and spherical characteristics of CMBs and the random forest classifier. Bian et al. (2013) proposed a 2D fast radial symmetry transform (RST) based method to roughly detect the possible CMBs. Then the 3D region growing on the possible CMBs was utilized to exclude the falsely identified CMBs. Ghafaryasl et al. (2012) proposed a computer aided system based on following three steps: skull-stripping, initial candidate selection and reduction of false-positives (FPs) by a two-layer classifier. Zhang et al. (2017) proposed voxel-vise detection based on a single hidden layer feed-forward neural network with scaled conjugate gradient. Chen (2017) proposed a seven-layer deep neural network based on the sparse autoencoder for voxel detection of CMBs. Seghier et al. (2011) proposed a system named MIDAS for automatic CMB detection.
All above methods have reached great progress in CMB detection. However, their detection accuracy and robustness are still in need of improvement. Therefore, in this paper, we employed the SWI for CMB imaging, which was because SWI could provide high resolution as reported in Haacke et al. (2009) and work as the most sensitive techniques to visualize CMBs. Considering the limited amounts of labeled images, and knowledge to recognize representative characters about the medical images, we considered utilizing the DenseNet as the basic algorithm for transfer learning. The reason for this is because the amount of labeled CMB images is typically very limited, and it is hard to effectively train a classifier to get high detection accuracy. In summary, we proposed using transfer learning of DenseNet for CMB detection based on the collected images, which means we use the knowledges obtained from training the related tasks by DenseNet for CMB detection.
The remainder of this paper is organized in a structure as follows: "Materials and Methods" section describes the method used in this research, "Transfer Learning" section explains why we employed the transfer learning, "CMB Detection Based on the Transfer Learning and DenseNet" section describes the research materials used in this paper, including the training set and test set, and also offers the experiment results, and finally, "Discussion" section provides the conclusion and discussion.

MATERIALS AND METHODS
In recent years, Deep Learning (DL) has achieved great progress in object recognition (Tong et al., 2019;Xie et al., 2019), prediction (Yan et al., 2018;Hao et al., 2019), speech analysis (Cummins et al., 2018), noise reduction (Islam et al., 2018), monitoring Wang et al., 2018), medicine (Raja et al., 2015;Safdar et al., 2018), the recommendation system (Zhang and Liu, 2014), biometrics (Xing et al., 2017) and so on. Traditionally, DL consists of multiple layers of nonlinear processing units to obtain the features. The cascaded layers take the output from their previous layer as input. In order to explore the potential of DL, many researchers tried to make the network deeper and wider. However, it suffers from either exploding or the vanishing gradient problem (VGP). Therefore, multiple different structures of DL were proposed. For example, AlexNet, the winner of ImageNet Large Scale Visual Recognition Competition (ILSVRC) 2012, was proposed by Krizhevsky et al. (2012) and has the same structure as LeNet but has max pooling and ReLU non-linearity. VGGNet, proposed by Karen Simonyan (2015), won the second place in ILSVRC 2014 and consisted of deeper networks (19 layers) compared to AlexNet. GoogLeNet, the winner of ILSVRC 2014, provides a deeper and wider network that incorporates 1 × 1 convolutions to reduce the dimensionality of the feature maps before expensive convolutions, together with parallel paths with different receptive field sizes to capture sparse patterns of correlations in the feature map stacks. ResNet, the winner of ILSVRC 2015, offers a 152-layer network that introduces a skip at least two layers or shortcut connections (He et al., 2016). Huang et al. (2017) proposed DenseNet where each layer takes the output from all previous layers in a feed-forward fashion and offers L(L+1)/2 connections for L layers, while traditional convolution networks with L layers provide L connections. According to the report in Huang et al. (2017), DenseNet can beat the state-of-the-art ResNet structure on ImageNet classification task.
Considering the outstanding performance of DenseNet, we proposed employing DenseNet for cerebral microbleed detection in this paper. The detail of DenseNet was introduced as follows. However, before providing the illustration of the DenseNet, we would first introduce the traditional convolution neural network (CNN) and figure out the difference between CNN and DenseNet later.

Traditional Convolution Neural Network
The traditional CNN usually includes convolution layer, ReLU Layer, pooling layer, fully connected layer and softmax layer (Zeng et al., 2014(Zeng et al., , 2016a(Zeng et al., ,c, 2017a. The functions of different layers are introduced as follows: Convolution layer works as the core session of a CNN. The feature maps are generated via the convolution of the input with different kernels. Mathematically, it can be expressed as Figure 1, which shows a toy example of convolution operation.
Then, following the convolution layer, we have non-linear activation function, named ReLU, which works to obtain the nonlinear features. The purpose of the ReLU layer is to introduce non-linearity into the network. The mathematic expression of ReLU is shown as Eq. 1: The pooling layer works by resizing the feature maps spatially to decrease the number of parameters, memory footprint and to make the computation less intensive in the network. The pooling function works on each feature map, the main approaches used for pooling are max pooling as Eq. 2, average pooling as Eq. 3: In which M stands for the pooling region and Rj represents for the number of elements within the pooling region.
Fully connected layers will calculate the confidential scores, which are stored in a volume of size 1 × 1 × n. Here, n means the number of categories, and each element stands for class scores.
FIGURE 1 | A toy example of convolution operation in CNN with stride size as 1, in which, the left matrix means the input, the second matrix means the kernel, and the right matrix stands for the generated feature map after convolution operation. It is different from the convolution defined in purely mathematic terms.
Every neuron of the fully connected layer is connected to all the neurons in the earlier layers.

Structure Revision of the CNN
In the traditional CNN, all layers are connected gradually as in Eq. 4: However, as the network becomes deeper and wider, the networks may suffer from either exploding or gradient vanishing. Therefore, researchers proposed different network structures to overcome this problem. For example, ResNet revised this behavior by short connection, and the equation is reformulated as (5).
Instead of making the sum of the output feature maps of the layer with the incoming feature maps, DenseNet concatenates them sequentially. The expression is reformulated into Eq. 6: In which l means the index of the layer number, H stands for a non-linear operation and x l stands for the output of the lth layer.

DenseNet
As expressed in Eq. 6, DenseNet introduces straight forward connections from any layers to all following layers. In other words, the lth layer receives feature-maps from all previous l -1 layers. However, if the feature maps' size changes, the concatenation operation is not feasible. Therefore, downsampling to change the size of the feature maps are introduced. In order to make the down-sampling in the structure of DenseNet possible, multiple densely connected dense blocks are introduced to divide the network. The layers between the blocks are named as transition layers that have batch normalization, convolution and pooling operations, as shown in Figure 2. Figure 2 describes a case of DenseBlock, in which the layer number is 5 and the growth rate is set as k. Each layer receives feature maps from all earlier layers.
For each operation H l , it generates k feature maps, which is defined as growth rate. Therefore, the l th layer will have k 0 + k(l − 1) feature maps, and k 0 is the number of channels in the  input layer. As the network typically has a large number of inputs, a 1 × 1 convolution is employed as the bottleneck layer before the 3 × 3 convolution layer to reduce the feature maps and improve the computation efficiency.
To further compress the model to improve the model compactness, the feature maps are further reduced by the transition layer. For example, if a dense block generates m feature maps and the compression factor is set as θ ∈ (0, 1], then the feature maps will be reduced to θm via the followed transition layer. If θ = 1, the number of feature maps will be the same. Figure 3 shows the structure of DenseNet, which is composed of three DenseBlocks, an input layer and transition layers. The cropped samples are used as the input, the final layer will tell us whether it is CMB or Non-CMB in this research.

TRANSFER LEARNING
DenseNet has been widely applied in the medical research. For example, Gottapu and Dagli (2018) proposed using DenseNet for Anatomical Brain Segmentation. Khened et al. (2019) proposed cardiac segmentation based on fully convolutional multi-scale residual DenseNets. Wang H. et al. (2019) offered a system for recognition of mild cognitive impairment (MCI) and Alzheimer's disease (AD), based on the ensemble of 3D densely connected convolution network. Considering the limited amounts of labeled training samples, it is far way from enough to retrain the whole network of DenseNet from scratch to get a high classification accuracy. Therefore, in this paper, we proposed transfer learning, which means frozen the earlier layers and retrain the later layers of DenseNet for CMB detection task. The structure of DenseNet used here is DenseNet 201.
In order to make the pretrained DenseNet 201 for CMB detection feasible, which was a binary classification of CMB or non-CMB, the fully connected (FC) layer with

Materials
The subjects used in this research are ten healthy controls and ten patients of CADASIL. Twenty 3D volumetric images were obtained from the 20 patients. Then, Software Sygno MR B17 was utilized to rebuild the 3D volumetric image. Each 3D volumetric image's size is uniformly set as 364 * 448 * 48. In order to mark the CMBs from the subjects manually, we employed three neuron-radiologists with more than twentyyears' experience. The rules were set as follows: (1) via tracking the neighbor slices, blood vessels were first excluded, (2) lesions should be smaller than 10 mm in diameter. The potential CMBs were labeled as either "possible" or "Definite, " Otherwise, regarded as non-CMB voxels. In case of the conflictions, we proposed to obey the rule that the minority should be subordinate to the majority.
The sample images were generated from the original image. We applied the sliding window whose size is set as 61 by 61 to the original image. The border pixels were discarded due to the fat and brain skull. All the pixels located within a sliding window were used as one input, and the point located in the center of the sliding window was used as the target value. It means that if the central pixel is true or 1, then the target value is 1, otherwise, the Conv+pooling FCL Softmax Frontiers in Neuroscience | www.frontiersin.org target label is set as 0. It is expressed in the Eqs 7 and 8: Where I stands for the cropped sample images generated via the sliding window, p represents for the central pixel, W(p) means the pixels which centered on pixel p and were located inside the sliding window, and Ou means the label value. Figures 4, 5 show the sample of CMB and non-CMB centered images. The sliding window was supposed to cover the image from left to right and top to bottom with the stride size as 1. Therefore, we got the total CMB voxels as 68, 847 and non-CMB voxels as 56, 582, 536. The training and test set was divided as Table 1. We randomly selected 10000 images for each category of the test, and the remaining images were used for training.
To make the images suitable for DenseNet, which should be resized as 224 × 224 × 3, we padded the images with zero. The preprocessed image sample is shown as Figure 6. Then, Figure 7 shows the flowchart of the DenseNet, including number of feature maps generated by each layer.
From Table 1, we can find that the Non-CMB training data dominates the majority type CMB, which will cause the classifier more bias toward to the Non-CMB. Therefore, it may cause difficulties in controlling false positives and false negatives, which means the model is hard to find the CMB samples. Therefore, in order to overcome this challenge, we introduced cost matrix (Wu and Kim, 2010). The cost ratio ct was set as 961 via Eq. 9: In which N non−CMB means the number of non-CMB training samples and N CMB stands for number of CMB training samples.
The reason for why we employ the cost matrix instead of over sampling or down sampling is mainly because we have more concerns about the false positives and false negatives, therefore it is better to highlight the imbalanced learning problem by using cost matrices instead of creating a balanced data distribution forcefully.

Experiment Design
The goal of this research is to identify the input image as either CMB or Non-CMB. In order to achieve this goal, we proposed using DenseNet 201 as the basic algorithm for transfer learning, based on the excellent performance of DenseNet on ImageNet classification task. Section "Materials" stated the materials used in this research. Based on the original images, we created 68, 847 CMB subsamples and 56, 582, 536 Non-CMB subsamples. 10000 samples were randomly selected as test samples. The remaining sub-samples were used for training. In order to overcome the problem of data imbalance, we proposed cost matrix to show the more concerns in false positive and false negatives. The experiment is carried on by Matlab on the Windows 10 Operation System with 2.88 GHz processor and 8 GB memory. The following experiments were carried out: (1) CMB detection based on DenseNet. The measurements used here include accuracy,  sensitivity and specificity. The definition of the measurements can be found in Zhang et al. (2018b).
(2) Different cutting points of transfer learning.
(3) In order to show the performance of proposed methods, we compared with other state of art work.
Considering the measurements provided in other research, we only used sensitivity, specificity and accuracy.
In order to provide better illustration of DenseNet, we added a flowchart with feature map size, learnable weights of each layer. As we only noted the size of the width, the length should be same with the width.

CMB Detection Result Based on DenseNet
The rebuilt network was composed of four DenseBlocks, one input layer, three transition layers, one fully connected layer with two neurons, a softmax layer and a classification layer, as described in Figure 8. Table 2 provides the detection result. The correctly detected CMBs were 9777, and for Non-CMB they were 9764. 236 non-CMBs were incorrectly detected as CMBs, and 223 CMBs were wrongly detected as non-CMBs. The sensitivity was achieved as 97.78%, the specificity was 97.64%, the accuracy was 97.71% and the precision was 97.65%. Above measurements were obtained based on the average of 10 runs as shown in Table 3.

Comparison to the Different Cases of Transfer Learning
In order to achieve the best performance of transfer learning, different cutting points for transfer learning were designed as shown in Figure 8. Due to the limited subjects, we mainly focused on retraining the later layers of DenseNet. Therefore, in case A, the DenseNet 201 except for the last three layers, was used as the feature extractor for this research, and we retrained the newly added three layers.
In case B, C, and D, we included extra layers for retraining. For example, case B retrained the DenseBlock 4, Batch normalization, Average pooling, Fully connected (FC) layer, softmax layer and classification layer. It was implemented via setting the learning rate to 0 for earlier layers and setting learning rate factor to 10 for layers to be retrained. Table 4 illustrates the comparison results. From Table 4, we can find that Case A performed slightly better than the other three cases in the terms of sensitivity and FIGURE 9 | Error bar. accuracy. Considering that in medical research, we focus more on the sensitivity and accuracy than on the other two terms, we thought Case A provided the best performance among all the cases. Figure 9 shows the error bar of the measurement values. From the point of storage consuming, all four cases take about the same RAM as we did when not using the precomputation method.

Comparison to the State of Art Work
In order to validate our proposed method, we compared different state of the art methods, including traditional machine learning methods and DL methods. From Table 5, we compared our method with single-hidden layer feed-forward neural-network (SLFN)+ (leaky) rectified linear unit, 4-layer sparse encoder, 7-layer sparse encoder, different layers of CNN, Naive Bayesian Classifier and so on. We can find that our proposed method offers the best performance. DenseNet works as a logical extension of ResNet but provides more compact models and fully uses the features. Figure 10 shows the bar chart of the comparison of the state of the state of art methods. It shows that our proposed method performs slightly better than the current best method, but largely improved compared to the traditional method naïve Bayes classifier (NBC).

DISCUSSION
In this paper, we proposed to employ DenseNet to detect CMBs in patients with CADASIL. DenseNet was proposed by Huang et al. (2017) and competed the other DL methods for ImageNet classification task because of its model compactness and fully used features. DenseNet are quite similar with ResNet, however, instead of the summation, DenseNet proposed the concatenation of all feature maps from previous layers, which encourages the feature reuse, the VGP alleviation, and the decreased number of parameters. Therefore, in this paper, we proposed using DenseNet for CMB detection by supposing CMB detection has similarity with ImageNet classification. However, because of the data imbalance, we utilized cost matrix to avoid the model bias toward non-CMB, which means the model would be hard to find CMBs if trained under the imbalanced dataset. As there are some other methods for data imbalance, such as over sampling and down sampling, we have more concerns about the false negatives or false positives. Therefore, instead of enforcing the data into balanced distribution, we employed the cost matrix. In order to check the best cutting point, we test different cases of transfer learning and the results are shown in Table 4, however, the difference is not so obvious. On the other hand, training less layers can help us save time and decrease the computation cost if we import the strategy of precomputation.
In the future, we will try to collect more samples and test more different structures for CMB detection. Meanwhile, the training cost long term is very high (Zeng et al., 2018), therefore it is necessary to optimize the algorithm to make the training fast (Zeng et al., 2016b,d). We will consider other precomputation and some optimization methods (Zhang et al., 2018a).

DATA AVAILABILITY
The datasets for this manuscript are not publicly available because due to the privacy of the subjects. Requests to access the datasets should be directed to shuihuawang@ieee.org.

ETHICS STATEMENT
This research was approved by Institutional Review Board (IRB) of the First Affiliated Hospital of Nanjing Medical University. We obtained written informed consent from the participants of this study.

AUTHOR CONTRIBUTIONS
SW proposed the study and wrote the draft. CT and JS designed the model and interpreted the results. CT and YZ analyzed the data. SW and YZ acquired the preprocessed data. All authors gave critical revision and consent for this submission.

FUNDING
This project was financially supported by Natural Science Foundation of Jiangsu Province (No. BK20180727).