Deep Network for the Automatic Segmentation and Quantification of Intracranial Hemorrhage on CT

Background The ABC/2 method is usually applied to evaluate intracerebral hemorrhage (ICH) volume on computed tomography (CT), although it might be inaccurate and not applicable in estimating extradural or subdural hemorrhage (EDH, SDH) volume due to their irregular hematoma shapes. This study aimed to evaluate deep framework optimized for the segmentation and quantification of ICH, EDH, and SDH. Methods The training datasets were 3,000 images retrospectively collected from a collaborating hospital (Hospital A) and segmented by the Dense U-Net framework. Three experienced radiologists determined the ground truth by marking the pixels as hemorrhage area. We utilized the Dice and intra-class correlation coefficients (ICC) to test the reliability of the ground truth. Moreover, the testing datasets consisted of 211 images (internal test) from Hospital A, and 86 ICH images (external test) from another hospital (Hospital B). In this study, we chose scatter plots, ICC, and Pearson correlation coefficients (PCC) with ground truth to evaluate the performance of the deep framework. Furthermore, to validate the effectiveness of the deep framework, we did a comparative analysis of the hemorrhage volume estimation between the deep model and the ABC/2 method. Results The high Dice (0.89–0.95) and ICC (0.985–0.997) showed the consistency of the manual segmentations among the radiologists and the reliability of the ground truth. For the internal test, the Dice coefficients of ICH, EDH, and SDH were 0.90 ± 0.06, 0.88 ± 0.12, and 0.82 ± 0.16, respectively. For the external test, the segmentation Dice was 0.86 ± 0.09. Comparatively, the ICC and PCC of ICH volume estimations were 0.99 performed by Dense U-Net that overmatched the ABC/2 method. Conclusion This study revealed the excellent performance of hematoma segmentation and volume evaluation based on Dense U-Net, which indicated our deep framework might contribute to efficiently developing treatment strategies for intracranial hemorrhage in clinics.


INTRODUCTION
Intracranial hemorrhages, such as ICH, EDH, SDH, and subarachnoid hemorrhages (SAH), are dangerous with high mortality and low functional recovery rates (Xi et al., 2006;Qureshi et al., 2009;Soffar, 2012). Moreover, the controversy over surgical intervention and conservative management still exists even though there is no significant outcome difference between the treatments (Mayer and Rincon, 2005;Mendelow et al., 2005;Mendelow et al., 2013). Therefore, the estimation of hemorrhage volume plays a critical role among the prognosis parameters to predict the outcome and standardize the clinical treatment (Broderick et al., 1993;Hemphill et al., 2001). Generally, the ABC/2 method has been used to calculate the hematoma volume at the time of symptomatic ICH diagnosis in previous studies (Kothari et al., 1996;Divani et al., 2011;Yaghi et al., 2015). With this approach, the hematoma was estimated as an ellipsoid, and A, B, and C were the orthogonal axes measured from CT or magnetic resonance imaging (MRI) (Hemphill and Lam, 2017;Liu et al., 2019). However, it was difficult to precisely quantify hemorrhage volume due to the limitations of the conventional ABC/2 method. Though the ABC/2 method might be practical and time-efficient for ICH with a single bleeding site, it might lead to inaccurate measurements for ICH with multiple bleeding sites (Huttner et al., 2006), and is not applicable for other intracranial hemorrhage types due to their irregular hematoma shapes. Moreover, a manual delineation of the contours of the hematoma could be time-consuming, which is not suitable for emergency settings. Therefore, computer-aided image segmentation could provide accurate and fast volume estimation for brain hemorrhages. A recent study showed that the ICH volume could be estimated using a random-forest based machine learning algorithm, with a Pearson correlation coefficient of 0.96 against the manual segmentation (Scherer et al., 2016). However, the study comprised 58 cases in total and was limited to spontaneous ICH. In addition, its accuracy decreased as the hematoma volume increased. Another study demonstrated that region proposal convolution neural networks could be used to simultaneously detect and segment brain hemorrhages (Chang et al., 2018). However, the dataset used in this study was from a single institution, and the ground truth was generated semiautomatically and was determined by only one radiologist, which might not be reliable for training and testing of the deep learning (DL) model.
In this study, we firstly applied a DL model based on Dense U-Net (Ronneberger et al., 2015;Guan et al., 2019) architecture to segment and quantify three types of brain hemorrhage (ICH, EDH, and SDH) on non-contrast CT images. Since contrast CT scans are needed for accurate SAH segmentation, SAH was excluded from the current study. To test the reliability of the ground truth masks from the experienced radiologists utilized in this study, we calculated the Dice and ICC among the three experienced radiologists. Then, on the internal and external test set, the segmentation Dice was evaluated to test whether the constructed model based on Dense U-Net could be utilized to segment the intracranial hemorrhage from the head CT successfully. Finally, we did a comparative analysis on ICC and PCC of ICH volume estimations between the deep model and the ABC/2 method to validate the effectiveness of the constructed framework based on Dense U-Net. We hope that this study could assess the feasibility of the constructed deep model for the accurate segmentation and quantification of intracranial hemorrhage on non-contrast CT, and provide a guide for clinical decision-making.

Data Collection
Institutional Review Board (IRB) approval was received from collaborating hospitals and informed consent was waived for this retrospective study. A total of 3,000 non-contrast brain CT scans containing ICH, EDH, and SDH were retrospectively collected from Beijing Tiantan Hospital Affiliated to Capital Medical University (Hospital A) for model training and validation. Each hemorrhage type had 1,000 scans, which were randomly partitioned for model training and validation with a ratio of 80:20%. Another 211 scans from the same hospital were reserved as the internal test set, where ICH, EDH, and SDH had 61, 87, and 63 scans, respectively. To test the validity of the deep framework, we collected 86 ICH CT scans from the QingPu Branch of Zhongshan Hospital Affiliated to Fudan University (Hospital B) as an independent testing set (i.e., external test set). Besides, in order to evaluate the performance of the model on non-hemorrhagic cases, 450 cases containing 48 hemorrhagic and 402 non-hemorrhagic cases were also collected from Hospital B. All images were acquired using brain CT protocols on scanners from various vendors, with the x-ray tube voltage around 120 kV and current around 400 mA. The matrix size was 512 × 512 and most of the scans had a slice thickness of 5 mm.

Ground Truth Determination
To determine the segmentation ground truth for model training, each CT scan was independently examined by three certified neuroradiologists and annotated using the 3D Slicer (Carrboro, NC, United States) (Kikinis et al., 2014). The common segmentations, i.e., pixels that were marked as hemorrhage positive by at least two neuroradiologists, were considered as the ground truth and each annotated slice was inputted into the 2D segmentation network for training. The segmentation by the radiologists against the ground truth and their volume estimation agreements were also evaluated using the Dice and ICC, respectively.

Model Construction and Training
The model was constructed on the MXNet platform (Chen et al., 2015). The DenseNet encoder consisted of 4 dense blocks, each followed by a transition layer (Huang et al., 2017). Each dense block had 3, 6, 12, and 8 dense units, respectively and each dense unit had a growth rate of 32. Feature maps extracted from each dense block were concatenated with the up-sampled maps to form a U-Net structure. The input into the model was one CT scan, and the output was the segmented masks for all FIGURE 1 | Illustration of the constructed Dense U-Net. The Dense U-Net had four dense blocks, with 3, 6, 12, and 8 dense units in each dense block. The growth rate was 32 for all dense units. BN, batch normalization; ReLU, rectified linear unit; Conv, convolution.
slices. Detailed structures of the constructed model are illustrated in Figure 1.
During the implementation of the DL model, the He-normal initialization was used for all convolutional kernels (He et al., 2015). The DL model was trained using a binary cross-entropy loss function (De Boer et al., 2005) which can make the training more stable than using a Dice loss function and was improved with a stochastic gradient decay (SGD) optimizer (Bottou, 2010). The learning rate was set to 0.001 with a momentum of 0.99. The model was trained on four Nvidia GTX 1080 graphic processing units (32 gigabytes total memory capacity) with a batch size of one. No image augmentation was applied during the model training. The training process was finished after 20 epochs when the validation loss had no improvement. No dropout layers were used without an overfitting problem (Srivastava et al., 2014). With the DL model, the pixel-wise segmentation results were achieved. Combining with CT thickness, we could calculate the volume of each pixel. The hemorrhage volume was calculated by accumulating all the volumes of pixels in the hemorrhage region.

Performance and Statistical Analysis
After training, the model was tested on the reserved testing data. The segmentation performance of the model was analyzed using the Dice coefficient against the ground truth. The sum of the segmented areas of each slice was multiplied by the corresponding slice thickness, yielding the hemorrhage volume of a patient. The segmentation-based volume estimation was then compared with the ground truth using scatter plots, ICC, and PCC. A non-parametric Wilcoxon signed-rank test (Rey and Neuhäuser, 2011) was used to evaluate the systematical volume bias of the segmentation-based method due to the non-normal distribution of the hemorrhage volume difference.

Manual and Automatic Segmentation Evaluation
For the testing data originating from Hospital A, Table 1 demonstrates the performance of the segmentation by the radiologists against the ground truth and their in-group volume estimation agreements using Dice and ICC, respectively. The high Dice (0.89-0.95) and ICC scores (0.985-0.997) showed that the manual segmentations were consistent among the three radiologists and the ground truth was reliable for model training and evaluation. Figure 2 shows the segmentation examples of ICH, EDH, and SDH using the modified Dense U-Net. The ground truth

Statistical Analysis
For ICH from Hospital A, the difference mean between the ABC/2 method and the ground truth was 3.2 ml, while the difference standard deviation was 11.5 ml and the absolute difference mean was 7.3 ml. However, for ICH volume estimated by the deep framework, the difference mean was 1.3 ml, the standard deviation was 1.6 ml, and the absolute difference mean was 1.5 ml. For ICH from Hospital B, the difference mean of the ABC/2 method was −2.1 ml while the difference standard deviation was 10.1 ml, and the absolute difference mean was 7.0 ml. For the DL model, the difference mean was −0.5 ml, with a standard deviation of 4.1 ml, and an absolute difference mean of 0.6 ml.
For EDH and SDH, only the hemorrhage volumes performed by the deep framework were analyzed because the ABC/2 method is not applicable for measuring the volumes of intracranial hemorrhage with irregular shapes. For EDH, the difference mean between the DL model and the ground truth was −0.3 ml, with a standard deviation of 1.4 ml, and an absolute mean of 1.1 ml. For SDH, the mean volume difference was −1.2 ml, with a standard deviation of 1.5 ml, and an absolute mean of 1.4 ml.
The Wilcoxon signed-rank test showed that the DL-based segmentation tended to systematically overestimate ICH volume by 1.0 ml for Hospital A (p < 0.001) and underestimated it by 0.4 ml for Hospital B (p < 0.001) compared with the ground truth. For EDH, there was no clear difference in hemorrhage volume estimation between the DL-based segmentation and the ground truth (p = 0.296). For SDH, the statistical test showed that the DL model underestimated the hemorrhage volume by 0.8 ml (p < 0.001). However, although the over-and underestimations were statistically significant, the biases were small and might not have clinical significance.
For data collected to evaluate the performance of the model on non-hemorrhagic cases, the DL model successfully identified 388 negative cases from 402 non-hemorrhagic cases with an accuracy of 96.5%. In this study, we first revealed the consistency of the manual segmentations among the three radiologists and the reliability of the ground truth masks with high Dice (0.89-0.95) and ICC scores (0.985-0.997). Then, we demonstrated that the Dense U-Net framework was remarkably accurate in the automatic segmentation of intracranial hemorrhages, including ICH, EDH, and SDH, with high Dice scores for both internal (0.82-0.90) and external test sets (0.86). We further verified that compared with the ABC/2 method, ICH volume estimated by the DL model had a stronger correlation with the ground truth volume as reflected by ICC (0.998) and PCC (>0.996-0.998) (see Figure 3 and Table 3).
Lastly, we assessed the model on a larger dataset containing non-hemorrhagic cases to verify its capability to segment lesions accurately without introducing more false positive results. These results indicated that the deep framework was more accurate than the ABC/2 method when quantifying the volume of large, complex-shaped intracranial hemorrhages.

DISCUSSION
This study indicated that the performance achieved by the constructed Dense U-Net was comparable to the manual segmentation by the radiologists for brain hemorrhage and volume estimation. Additionally, the model was robust for a broad range of volume between 1 and 100 ml, and it could also be applied to various brain hemorrhage types in comparison with the previous findings reported (Scherer et al., 2016). Moreover, the constructed Dense U-Net in this study used half of the dense units of the DenseNet reported previously (29 vs. 58 of DenseNet-121) (Huang et al., 2017). That is, our modification halved the model parameters, yet the model could still achieve high segmentation performance using fewer computational resources. With the current configuration and hardware conditions, the trained model could finish inferring one brain CT scan within approximately 5 s. The time-efficiency also enabled the possibility of the DL network to be applied to emergency settings, especially in the situation for brain hemorrhage patients.
According to the results on volume estimation, the ABC/2 method showed higher standard deviation and thus was more unstable compared to ground truth. The ABC/2 method uses the lengths of the principal axes of ellipsoid to assess its volume, yet the shape of lesions, i.e., the hemorrhagic regions, is much more irregular and therefore cannot be assessed accurately by this means consistently. Thus it is of great importance to find more widely applicable and accurate methods to estimate the hemorrhagic volume. Using deep learning algorithms to estimate hemorrhagic volume automatically is one potential approach and the performance of the proposed Dense U-Net in this study also verified its capabilities to segment and analyze hemorrhagic regions accurately and quickly. In order to further improve its performance, more possible influencing factors could be considered in future work, such as the location and shape of hemorrhagic regions.
Inter-institutional robustness of the model was checked by testing the model using internal and external test sets from different hospitals. The average Dice coefficient of the external test set (ICH from Hospital B) dropped about 4% compared with that of the internal test set (from Hospital A), which might be the reason that the external test datasets (from Hospital B) were not used for model training. For example, the two hospitals used different scanners as Hospital A mainly used scanners manufactured by GE and Siemens, while Hospital B mainly used scanners from GE and United Imaging Healthcare. Although only ICH scans were currently curated from Hospital B, the DL model still yielded excellent performance, indicating its strong robustness for the segmentation and quantification of intracranial hemorrhages. Furthermore, the good robustness of the Dense U-Net structure in different institutions could make its application significantly more widespread.

LIMITATION
Though the small volume estimation difference of the segmentation-based method might not have any clinical significance (see the "Statistical Analysis" section of "Results"), it was interesting to note that this method tended to slightly overestimate the hemorrhage volume for ICH and systematically underestimate for SDH. More testing data might be needed to confirm this conclusion. Nonetheless, such a pattern might be related to how the model processed the edges of ICH and SDH. Since SDH was close to the skull, which also had high intensity on CT images, the model might tend to drop certain pixels of SDH at the interface. This might also be the reason for the relatively low Dice scores of SDH.

CONCLUSION
This study demonstrated the high performance of a deep framework based on Dense U-Net for the automated segmentation and quantification of intracranial hemorrhages, including ICH, EDH, and SDH on non-contrast CT. Furthermore, the deep model also achieved strong robustness when tested on internal and external datasets from different hospitals. Moreover, the Dense U-Net utilized significantly fewer model parameters yet achieved accurate segmentation and precise volume quantification performance. With the high performance and time-efficiency, the model might potentially provide a promising tool to assist with treatment decisions for intracranial hemorrhages.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Boards of Beijing Tiantan Hospital Affiliated to Capital Medical University and QingPu Branch of Zhongshan Hospital Affiliated to Fudan University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JX, RZ, and JM conceived the study and participated in the literature search, study design, data collection, and data analysis. ZZ and CW completed the data analysis and interpretation. SW and GW completed the statistical analysis. RZ, HZ, and CX prepared the manuscript. YD edited the manuscript. All authors read and approved the final manuscript.