Edited by: Tianyi Yan, Beijing Institute of Technology, China
Reviewed by: Bart M. ter Haar Romenij, Eindhoven University of Technology, Netherlands; Yong Fan, University of Pennsylvania, United States
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Arterial input function (AIF) is estimated from perfusion images as a basic curve for the following deconvolution process to calculate hemodynamic variables to evaluate vascular status of tissues. However, estimation of AIF is currently based on manual annotations with prior knowledge. We propose an automatic estimation of AIF in perfusion images based on a multi-stream 3D CNN, which combined spatial and temporal features together to estimate the AIF ROI. The model is trained by manual annotations. The proposed method was trained and tested with 100 cases of perfusion-weighted imaging. The result was evaluated by dice similarity coefficient, which reached 0.79. The trained model had a better performance than the traditional method. After segmentation of the AIF ROI, the AIF was calculated by the average of all voxels in the ROI. We compared the AIF result with the manual and traditional methods, and the parameters of further processing of AIF, such as time to the maximum of the tissue residue function (Tmax), relative cerebral blood flow, and mismatch volume, which are calculated in the Section Results. The result had a better performance, the average mismatch volume reached 93.32% of the manual method, while the other methods reached 85.04 and 83.04%. We have applied the method on the cloud platform, Estroke, and the local version of its software, NeuBrainCare, which can evaluate the volume of the ischemic penumbra, the volume of the infarct core, and the ratio of mismatch between perfusion and diffusion images to help make treatment decisions, when the mismatch ratio is abnormal.
In recent years, ischemic stroke has become a tremendous health problem all over the world (
Perfusion-weighted imaging (PWI) can be used to assess perfusion parameters for noninvasive diagnosis of stroke conditions. This method involves monitoring the continuous changes of the time density curves (TDCs) of a bolus tracer passing though the capillary bed over time. Quantitative analysis using dynamic susceptibility contrast (DSC) MRI perfusion requires determination of the AIF, which is the concentration of the contrast agent over time in a brain-feeding artery. The tissue TDC can be considered as a convolution of the response function with the AIF. To analyze ischemic tissue, the response function which can be calculated by deconvolution with the AIF is necessary. We operated a deconvolution with the TDC on each voxel to obtain hemodynamic maps containing cerebral blood flow (CBF), cerebral blood volume (CBV), time to maximum of the tissue residue function (Tmax), and mean transit time (MTT). The characteristic TDC of a voxel in a major arterial vessel (like the Basal Artery or the Internal Carotid Artery) is considered as AIF, which is known as a reference curve to calculate hemodynamic maps. The AIF is a key reference curve used in the deconvolution model to obtain quantitative CBF, CBV, Tmax, and MTT estimation. As it is the reference curve, AIF has a great influence on the result of the deconvolution operation. To improve reliability, quality, and reproducibility of the AIF estimation, several approaches have been proposed, including alternative measurement techniques such as application of imaging protocols or data processing. Lorenz and Calamante proposed a local AIF extraction method to replace the global AIF (
The AIF obtained from a single voxel or a small region is not reliable enough, since noise in spatial measurements and motion in temporal measurements affect the AIF estimation. Therefore, it is more appropriate to extract the AIF in a region or volume (
For manual AIF annotation, the investigator selects an AIF with the cursor and marks the position of the AIF on the PWI data. In the meanwhile he checks the corresponding concentration curve of the bolus tracer. The investigator first selects a region of interest associated with the main feeding vessel, such as the middle cerebral artery. The TDC is displayed according to the pixel as the cursor moves. The investigator determines the pixel location in the region of interest, when the curve is consistent with the AIF characteristics. Subjectively, the ideal AIF is defined as a curve with large amplitude, small width, fast attenuation, and it can also be described as a gamma variate function fitted to the bolus tracer TDC.
In a 2D network, convolutions only compute features in a plane on the images. It is not applicable to perfusion data analysis, which must extract features in multiple volumes on spatial dimensions and features in multiple frames on the temporal dimension, since 2D convolution can only compute features on static images. 3D CNN is more efficient for temporal features learning than 2D convolution (
Perfusion data are 4D data, both spatial and temporal features play an important role in AIF ROI estimation. 3D convolutions alone cannot perform well in the temporal dimension. Since perfusion data are 4D data, it is difficult to process it in a single network. In order to fuse spatial features with temporal features into our network, we applied a multi-stream 3D CNN that processes information on both dimensions. Thus, our network performs operations on the input volume data in both streams simultaneously. Spatial features such as location information in brain tissue are extracted in the first stream, while temporal features such as the TDC information are captured in the second.
The spatial network operates on spatial volumes with a size of
The temporal network operates on data frames with a size of
The 3D CNN is shown in
Single 3D CNN network architecture. The 3D CNN network architecture includes eight convolutional layers, five pooling layers, and two full connected layers.
Multi-stream 3D CNN network. Each stream is a 3D CNN network, and the streams are combined by a fusion layer using linear SVM.
PWI data are arranged in the order of frame volumes, as shown in
Data sequence order. The sequence is arranged slice by slice in each frame and then arranged frame by frame.
Arterial input function (AIF) curve extracted from time series. The red point shows the location where AIF is extracted, the value of PWI decreases first and then increases with time.
The training of the multi-stream 3D CNN framework for the segmentation of the AIF is done in two steps: manual labeling ROI’s perfusion data and auto-labeling based on similarities among the TDCs.
For labeling, we first proceeded to a manual annotation of the ROI which is used to extract the AIF. Thereafter, we calculated the similarity between the TDCs and the AIFs in a neighborhood, then we input the label into the framework. We labeled each volume into two classes of regions, namely AIF vessels and no AIF vessels, respectively.
We constructed an architecture in this paper to segment the AIF vessels in the perfusion volume. The input image is the 3D volume region. To improve the performance of the 3D CNN in this case, we built the multi-stream model with the spatial and temporal networks. Then, the final probability map is fused together. The loss function over all training datasets was minimized through a mini-batch gradient descent approach, and the minimum batch size was 50 inputs. The spatial learning process goes through 50 epochs with a learning rate of 0.001 and a gradient momentum of 0.9. The same parameter settings are used for epoch number, learning rate, and gradient momentum in the temporal learning process.
After the segmentation of the AIF vessels, we calculated the AIF by averaging all TDCs of voxels in the classified vessels.
In this study, we collected 100 PWI cases in which 30 were healthy cases and 70 were stroke cases. They were used to train and evaluate the performance of the different methods. Sixty PWI cases among the total dataset were acquired on a 1.5T Discovery MR750 GE MRI scanner with contrast agent at a parameter setting of a TE = 2.6 ms, a TR = 22 ms, and a flip angle = 20-degree. The voxel size is 0.43 × 0.43 × 5.00mm3, and each volume contains 512 × 512 × 20 × 50 voxels, corresponding to the width, the height, the number of slices, and the number of frames, respectively. The other 40 PWI cases were acquired on a 3T Verio SIEMENS MRI scanner with contrast agent at a parameter setting of a TE = 3.6 ms, a TR = 21 ms, and a flip angle = 18-degree, their voxel size is 0.43 × 0.43 × 6.50mm3 and each volume size is 512 × 512 × 18 × 35.
The preprocessing for perfusion data includes skull removal, motion correction, slice time correction, spatial smoothing, global drift removal, which are general preprocessing stages for perfusion data. To reduce the impact of the brain skull, each dataset was preprocessed to remove the brain skull using the BET2 method (
In this paper, our experiments were implemented, respectively, using MATLAB 2017b and Python 3.0 in Window 10 OS. Environments were made on a desktop computer with eight Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50 GHz processers, 32 GB of RAM memory, and NVIDIA GeForce GTX 1080.
Although manual annotations of 100 cases of PWI sequences require a considerable amount of time, in each case we manually segmented the vessels of MIP images on axial planes. The performance of the proposed method in cerebrovascular segmentation is evaluated by comparing MIP post-processed binary images of the proposed method with manual annotations of images on axial planes. Because MIP images on axial planes display most of the blood vessels, the comparison of MIP binary images on axial planes can better illustrate cerebrovascular segmentations differences between the proposed method and manual annotations. Therefore, we evaluated the binary classification performance of our proposed method with parameters such as the accuracy, the sensitivity, the specificity, the precision, and the dice similarity coefficient (DSC), defined as
Fuzzy c-means is widely used in determination of the AIF (
Unet3D is a deep learning network applied to 3D data and widely used in biomedical image segmentation. We calculated the maximal intensity projection on the temporal dimension. Then, we segmented the blood vessels by Unet3D and applied fuzzy c-means to determine the AIF only in blood vessels segmented by Unet3D.
Since there is no standard dataset with labeled AIF ROIs, we can only compare all methods with the manual method. The result of the manual method is considered as ground-truth. However, this comparison is not convincing enough, so we compared the further process result by deconvolution: Tmax and rCBF; and it is more persuasive. Subjectively, a single blind investigator will evaluate the shape and location of the AIF result.
The regions in which Tmax is >6 s are considered ischemic regions. CBF is significantly lower in the ischemic region than in the contralateral healthy region. Perfusion–diffusion mismatch is used to identify penumbra in acute stroke. When the apparent diffusion coefficient (ADC) is less than 620, the corresponding region is defined as an infract core in our application. The ischemic region beyond the infarct core is considered to be the ischemic penumbra. Mismatch ratio is the ratio of ischemic penumbra volume to core infarct volume. The larger the ratio is, the more tissues can be saved by treatment. Since the infarct cores were the same in our case, we compared the volumes of the ischemic penumbra and the mismatch ratio obtained by the AIF results, which are presented in our method and the other methods mentioned above.
We show an example of MIP images on temporal dimensions first and then on spatial dimensions.
The AIFs were estimated by different methods: manual, MS3DCNN, fuzzy c-means, and U-Net3D + fuzzy c-means. Subjectively, a single blinded investigator, who is a Doctor of Medicine working in the Department of Radiology in Xuanwu Hospital of Capital Medical University, which has a high level of neurosurgery and neurology, evaluated the results. Objectively, we compared the AIF location, the curve characteristics of AIF, the perfusion maps, and the mismatch volume estimated by all the other three methods with the same corresponding parameters but estimated by the manual method, respectively.
We obtained each AIF ROI on the maximum density projection, and we compared them, as shown in
AIF ROI on each AIF masked on MIP in a single slice.
Subjectively, the AIF extraction location of all samples calculated by our method in this paper is similar to that of the manual annotation. And since the manual annotation of each doctor will be different, hence the investigator believes that the ROI results of MS3DCNN can be consistent with those obtained from manual annotation.
Objectively, we compared the manual annotated AIF location with those from the three other methods. DSC values were estimated by comparing cerebrovascular segmentations of the MIP images in axial planes with the corresponding manual ground-truths. Values in each column are the average of the testing dataset, as shown in
Comparison of DSC value between the automated AIF estimation methods.
Fuzzy c-means | 0.9947 | 0.5073 | 0.9997 | 0.9472 | 0.6141±0.155 |
U-Net3D + fuzzy c-means | 0.9983 | 0.7620 | 0.9993 | 0.8405 | 0.7941±0.048 |
MS3DCNN | 0.9982 | 0.7967 | 0.9991 | 0.7981 | 0.7966±0.035 |
Subjectively, the curve of AIF also conformed to the morphological characteristics mentioned above, such as large amplitude, small width, fast attenuation, and gamma-like shape. In all cases, the investigator believed that AIF obtained by MS3DCNN can have the characteristics mentioned above, as shown in
AIF estimated by manual method, multi-stream 3DCNN, Unet3D + fuzzy c-means, and fuzzy c-means. AIF obtained by multi-stream 3D CNN is closest to the manual AIF.
Due to the different conditions of the patients, including the physical condition, the severity of the disease, the contrast injection time, and the time to start the scan, it is difficult to make statistical analysis of the parameters obtained by direct comparison. So, we only compared the differences of the curve parameters between the AIF extracted by the three automated methods and the AIF extracted by the manual method. The characteristics are amplitude, the center position, and the crest width, so the differences are represented by Δamplitude, Δcenter, and Δwidth. Although we processed all the samples, considering the great number of samples, we only showed 20 of them in
The difference of curve characteristics between the MS3DCNN and the manual method.
1. | 21.51 | 0.68 | 0.33 |
2. | 17.08 | 1.31 | 0.16 |
3. | 11.83 | 0.15 | 0.77 |
4. | 19.74 | 0.58 | 0.36 |
5. | 47.47 | 2.97 | 1.22 |
6. | 11.38 | 0.31 | 0.87 |
7. | 29.05 | 0.38 | 1.27 |
8. | 37.21 | 0.56 | 0.67 |
9. | 12.82 | 0.11 | 0.75 |
10. | 32.60 | 1.29 | 0.38 |
11. | 25.21 | 0.78 | 0.16 |
12. | 4.31 | 2.64 | 2.14 |
13. | 1.23 | 0.23 | 0.29 |
14. | 27.29 | 2.45 | 0.23 |
15. | 15.25 | 0.18 | 0.06 |
16. | 17.82 | 0.35 | 1.35 |
17. | 2.70 | 5.24 | 2.17 |
18. | 25.51 | 1.08 | 0.46 |
19. | 19.26 | 0.61 | 0.08 |
20. | 11.77 | 0.70 | 0.86 |
Mean | 19.10 | 1.08 | 0.71 |
11.04 | 1.23 | 0.59 |
Mean and standard deviation of the difference between the automated methods and the manual method.
MS3DCNN | Mean | 18.12 | 1.05 | 0.72 |
11.16 | 1.21 | 0.59 | ||
Fuzzy c-means | Mean | 14.08 | 1.31 | 0.90 |
12.32 | 1.43 | 0.75 | ||
U-Net3D + fuzzy c-means | Mean | 13.98 | 1.36 | 0.73 |
12.21 | 1.31 | 0.60 |
We calculated the similarity of the curves between each automated method and the manual method. The similarity is calculated by Frechet distance, which is greater than or equal to 0. The smaller the Frechet distance between two curves is, the more similar they will be. For better statistical analysis, all the curves should be on the same scale. They were normalized by the peak value of the manual extracted AIF, so the curve value of manual extracted AIF is between 0 and 1, as a reference. The similarity of all samples was calculated and the means and the standard deviation of the similarity were obtained. The mean value of the similarity of our method was lowest, with the value of 0.83, indicating that MS3DCNN method is the closest to the manual method. The standard deviation of our method was also the lowest, with the value of 0.13, indicating that this method is more stable.
We calculated the response curve to the AIF of each pixel in each sample by deconvolution. Then we calculated the time to peak of response curves as Tmax, and normalized maximum slope as rCBF, collectively known as perfusion maps, as shown in
rCBF calculated by AIF from
Tmax calculated by AIF from
After observing the perfusion maps and analyzing the patient’s medical history, the investigator concluded that the perfusion maps could be used for diagnosis. The distribution of rCBF and Tmax maps obtained by the same deconvolution processing based on the AIF extracted by our method is basically the same as that of the manual method, the location of ischemic regions can be clearly located through the perfusion maps, while the other two methods have lower Tmax and rCBF, which leads us to underestimate of the size of the ischemic penumbra and the severity of ischemia, respectively.
We still took the perfusion maps calculated by AIF extracted manually as the reference, and made an objective comparison by calculating the difference between the perfusion maps calculated based on AIF extracted by each automatic method and those calculated based on AIF extracted by the manual method. The mean and standard deviation of differences between rCBFs and Tmaxs are shown in
The difference of Tmax and rCBF between the automated methods and the manual method.
MS3DCNN | Mean | 0.79 | 0.76 |
0.10 | 0.11 | ||
Fuzzy c-means | Mean | 0.43 | 0.49 |
0.20 | 0.28 | ||
U-Net3D + fuzzy c-means | Mean | 0.39 | 0.43 |
0.29 | 0.24 |
The Tmax and rCBF values obtained by our method are larger than those obtained by the other two methods, and their performance is consistent with that AIF. The values of Tmax and rCBF are both superior to the latter two methods due to the lower time to peak and the larger peak value of the AIF.
We applied our method on a cloud platform, Estroke,
We defined the region with Tmax >6 s as the ischemic region, and the region with ADC which is an additional sequence, less than 620 as the infarct core. The difference between the two volumes was defined as the mismatch volume, and the mismatch volume divided by the ADC < 620 was defined as the mismatch ratio. The larger the mismatch ratio is, the bigger is the volume of brain tissue that can be saved. A mismatch annotation and information was shown in
Stroke analysis results of the ischemic region, the infarct core, the mismatch volume, and the mismatch ratio. The infarct core was marked in magenta while the ischemic region was marked in green.
The infarct core cannot be found in the image of many samples, so this ratio will be infinite, and for different methods, the ischemic area will be different because of the AIF extracted by different method, but the infarct core based on the ADC is the same. This is the reason we did not compare mismatch ratios, but only compared the mismatch volumes.
The mismatch volume depends on the severity of the stroke. For example, some samples have only small ischemic areas, and others have an entire brain hemisphere tissues with ischemia, there is a huge difference between such samples. For this reason we calculated the mean of the mismatch volumes for all samples with stroke in each method, but without standard deviation, and the ratio to the manual method as the reference was also calculated, as shown in
The average mismatch volume and ratio to reference of the automated methods and the manual method.
Manual (reference) | 48.52 | 100 |
MS3DCNN | 45.27 | 93.32 |
U-Net3D + fuzzy c-means | 41.26 | 85.04 |
Fuzzy c-means | 40.29 | 83.04 |
CNN models are deep learning models which have been widely used for object recognition and segmentation. They are usually trained with a large amount of images labeled by humans. But this has not yet been applied for AIF estimation. While automatic AIF estimation only relies on and is heavily influenced by one’s prior knowledge, most of the traditional segmentation methods are unsupervised. These last ones can only extract objects based on observable or expressed features using prior knowledge. In this study, we applied a multi-stream 3D CNN to find the AIF ROI, then we calculate the average curve as AIF.
According to our experimental results, multi-stream 3D CNN has a good performance in AIF ROI segmentation. Different blood vessels share many similar features such as their shapes, while their differences mainly are intensity contrast and vessel thickness. On the timeline, the changes of each voxel are continuous and have large amplitude, small width, fast attenuation, and gamma-like shape. The network extracts features from both the spatial and the temporal parts, followed with a fusion classification to output the result. Moreover, to improve the robustness of the proposed method for different kind of perfusion images, we included from both 1.5T GE and 3.0T SIEMENS, images of healthy and ischemic brain tissue in the training dataset.
Because manual annotations from public datasets were not available, hence we decided to directly compare our method to other network-based segmentation methods. We therefore compare our method to a traditional method, and to a network pre-process method followed by a traditional method. However, in terms of Dice numbers, our unsupervised method shows great potential to perform AIF ROI segmentation.
RAPID is a currently available commercial software which can measure ischemic penumbra. It has been proved effective in several international multicenter clinical trials (
We proposed a new multi-stream 3D CNN network to estimate AIF in brain perfusion images. The model was trained by the labels obtained from manual annotations and similar ROIs based on annotations, which is cost effective in terms of manual efforts. This segmentation framework had a good performance evaluated on perfusion images. The AIF estimation also had a good performance evaluated on PWI data.
Written informed consent was obtained from all participants and all protocols were approved by the Institutional Review Board.
SF, YB, and YK designed the model and implementation. EW, XJ, and QY provided the data and was responsible for the analysis and evaluation of the results. DW manually calibrated and designed the experiment. SF wrote the manuscript. YK reviewed the manuscript.
SF, YB, and YK were employed by Neusoft Institute of Intelligent Healthcare Technology, Co. Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.