Edited by: Zhiwei Luo, Kobe University, Japan
Reviewed by: Atif Mehmood, Xidian University, China; Shuwan Pan, Huaqiao University, China
†These authors have contributed equally to this work and share first authorship
This article was submitted to Bionics and Biomimetics, a section of the journal Frontiers in Bioengineering and Biotechnology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Colonoscopy is currently one of the main methods for the detection of rectal polyps, rectal cancer, and other diseases. With the rapid development of computer vision, deep learning–based semantic segmentation methods can be applied to the detection of medical lesions. However, it is challenging for current methods to detect polyps with high accuracy and real-time performance. To solve this problem, we propose a multi-branch feature fusion network (MBFFNet), which is an accurate real-time segmentation method for detecting colonoscopy. First, we use UNet as the basis of our model architecture and adopt stepwise sampling with channel multiplication to integrate features, which decreases the number of flops caused by stacking channels in UNet. Second, to improve model accuracy, we extract features from multiple layers and resize feature maps to the same size in different ways, such as up-sampling and pooling, to supplement information lost in multiplication-based up-sampling. Based on mIOU and Dice loss with cross entropy (CE), we conduct experiments in both CPU and GPU environments to verify the effectiveness of our model. The experimental results show that our proposed MBFFNet is superior to the selected baselines in terms of accuracy, model size, and flops. mIOU,
Medical image processing is an important part of medical processes. At present, the main research directions in medical image processing include image segmentation, structure analysis, and image recognition. Among these, image segmentation is very important for the detection of lesions and organs, which significantly aids the development of medical automation, reduces the burden on medical workers, and reduces the incidence of medical accidents caused by human error (
Based on the machine algorithm of manually extracted features, features such as color, shape, and appearance have been applied to the classifier to detect polyps (
In this study, to better achieve the precise real-time segmentation task of polyps and considering these problems, we developed the following strategies:
Avoid the loss of local low-dimensional features by large up-sampling directly, which leads to the loss of too many features on the segmenting boundary and the inability to restore complete edge information.
Avoid superimposing feature information on channel dimensions only through feature maps to retain feature information, which will lead to an overbloated feature map in the last few layers of the feature map, resulting in the model requiring a large number of calculations.
Based on these strategies, we propose a multi-branching feature fusion network for polyp segmentation. We first propagated the context information to the higher-resolution layer through progressive up-sampling to obtain the preliminary polyp features. This method followed strategy 1, and we avoided the channel dimension superposition feature information of the UNet (
We propose a model improvement approach that provides effective support for the efficient application of deep learning models in large-scale medical environments.
An efficient polyp segmentation network is proposed that can accurately and effectively segment polyp images without the need for costly computer resources. Real-time colonoscopy detection can be guaranteed using existing computer resources.
Our proposed model shows good performance and generalization ability in a variety of different medical image datasets and can be extended to the detection of other medical issues.
In this article, the detailed model structure and parameter number verification are described in section “Materials and Methods,” the experimental part of the model is discussed in section “Experiments,” and a summary of the model is presented in section “Conclusion.”
In this section, we first introduce and analyze the advantages and disadvantages of PspNet (
With PspNet (
Pspnet structure.
To solve the problem of medical image segmentation and accurate boundary segmentation, UNet (
Unet structure.
Considering the above problems and the advantages and disadvantages of different models, we proposed the MBFFNet, which has a better lightweight network structure, and can simultaneously consider model accuracy and rapid deployment. Compared with UNet (
Multi-branch feature fusion network. The backbone initially extracts the features of images, and images with different subsampling multiples are superimposed in an hourglass image pyramid (subsample images have a size larger than 128, and up-sample images have a size smaller than 128, which is equal to 1 × 1 standard convolution for images with size larger than 128). To maximize the use of cross-channel and cross-resolution image branches, each branch is up-sampled and multiplied by the previous layer. Finally, the predicted image of the original image size is obtained.
The polyp images used in this section were derived from the following datasets: ETIS, CVC-ClinicDB, CVC-ColonDB, Endoscene, and Kvasir. Kvasir is the largest and most extensive dataset released in 2017, and we selected polyp images from a subcategory of the Kvasir dataset (polyps). CVC-ClinicDB, which is also known as CVC-612, consists of 612 open-access images from obtained 31 colonoscopy clips. The CVC-ColonDB is a small database containing 380 images from 15 short colonoscopy sequences. ETIS is an established dataset containing 196 images of polyps for the early diagnosis of colorectal cancer. Endoscene is a combination of CVC-612 and CVC300. We integrated these data and eliminated the fuzzy images and finally obtained 1450 polyp images as the experimental data in this section.
To prove that the proposed model has better generalization ability, we collected a variety of medical image segmentation datasets for verification of our model. Common medical images share certain similarities. Therefore, we selected a larger number of medical image datasets to verify the robustness of our model.
In addition, our datasets are obtained from publicly available competitive medical datasets online, follow standard biosecurity and institutional safety procedures, and can be downloaded online. The raw data are available in articles, supplements, or repositories.
This dataset consists of 30 images from the subbasal corneal nerve plexus obtained in normal and pathological subjects. Thirty images were obtained from 30 different normal or pathological subjects (diabetes mellitus, pseudoextirpation syndrome, and keratoconus). The instrument used to acquire these data was a Heidelberg Retina Tomograph II with a Rostock Corneal Module (HRTII32-RCM) confocal laser microscope.
This dataset was provided by the MICCAI 2018 LITS Challenge and consisted of 400 CT scans. Two distinct labels were provided for ground truth segmentation: liver and lesion. In our experiment, we treated only the liver as positive and the other parts as negative.
This dataset was provided by the Lung Image Database Consortium Image Collection (LIDCIDRI) and was collected by seven academic centers and eight medical imaging companies. To simplify the processing, only the lungs were segmented, and the remaining non-lung organs were treated as the background.
This dataset was provided by the electron microscopy (EM) Segmentation Challenge as part of ISBI 2012. The dataset consisted of 30 (512 × 512 pixels) continuous slice transmission electron microscope images of the ventral nerve cord of the first instar larvae of
The dataset was provided by the HE Data Science Bowl 2018 Segmentation Challenge and consisted of 670 segmenting nuclear images from different patterns (bright and fluorescent). This is the only dataset in this work that uses instance-level annotation, where each kernel is colored differently.
This task is based on the DRIVE dataset, which uses photographs from the diabetic retinopathy screening program in Netherlands. The aim was to isolate the blood vessels in the fundus image.
This dataset was provided by the First Affiliated Hospital of Sun Yat-sen University and comprised a total of 13,240 CT images (80 × 80) labeled by professional doctors. The goal of this dataset was to segment the esophageal cancer region in the CT image, with the non-esophageal cancer region as the background.
For the polyp segmentation experiment in this section, the framework used for the training model was TensorFlow (
Considering the polyps, liver, bowel, and medical images compared to natural images, medical imaging has the following characteristics. First, compared to a variety of modes, different imaging mechanisms of different modal medical images also have different characteristics, such as format, size, and quality, and it is necessary to better design the network to extract features of different modes. Second, the shape, size, and position of different tissues and organs vary greatly. Third, the texture feature is weak and requires a higher feature extraction module. Fourth, the boundary is fuzzy, which is not conducive to accurate segmentation.
To train our model effectively, we divided the dataset into an 8:2 ratio. Eighty percent of the datasets were used for model training and 20% for model testing.
To improve the robustness of the model, appropriate image enhancement is required for the training image. In this study, brightness enhancement, scaling, horizontal flip, shift, rotation, and channel transformation were performed on the training image. Owing to the limited number of medical images, we could not use the limitation of commonly used image tasks, so we chose the most commonly used data enhancement parameters of existing medical images. The specific proportions and effects are listed in
Image enhancement setting parameters.
Brightness | −0.2 to 0.2 |
Zoom | −0.75 to 2 |
Horizontal flip | 0.5 |
Shift | 0.5 |
Rotation | −0.5 to 0.5 |
Channel transformation | 10 |
Image enhancement renderings.
To fully verify the accuracy of the proposed model, we chose three evaluation indicators to evaluate the model as a whole in order to more fully and intuitively prove the effect of our model. Three metrics are as follows.
This calculates the ratio of the intersection and union of two sets of true and predicted values. This ratio is the sum of true positive (TP) divided by TP, false positive (FP), and false negative (FN). FN indicates that the prediction was negative, but the label result was positive; an FP is actually a negative case, and for a TP, the prediction is positive. In fact, it is also a positive example, indicating that the prediction result is correct, where
In an ideal situation, it would be best if both evaluation indexes were high. However, a high precision generally means low recall, and high recall means low precision. Therefore, in practice, it is often necessary to make a trade-off according to specific circumstances, such as the general search situation. To ensure the recall rate, the precision rate should be improved as much as possible. For example, for cancer detection, seismic detection, financial fraud, and so on, the recall rate should be increased as much as possible to ensure accuracy. A new index, the
The Dice coefficient is a set similarity measurement function, which is usually used to calculate the similarity between two samples, and its value range is [0,1]. The inclusion of
The loss function (Dice loss) is formulated according to the Dice coefficient because the real goal of segmentation is to maximize the degree of overlap between the real tag and the predicted one, that is, the similarity. However, when the Dice loss is used, there is severe shock when positive samples are generally small targets. In the case of only the foreground and background, once some pixels of small targets are predicted incorrectly, the loss value will change significantly, leading to a drastic gradient change. In the extreme case, it can be assumed that only one pixel is a positive sample. If the prediction of this pixel is correct, the prediction results of the other pixels will be ignored, and the loss is always close to 0. The prediction error causes the loss to approach 1. For the cross-entropy loss (CE loss) function, CE is a proxy form, and it is easy to maximize optimization in the network by virtue of its characteristics, which averages the value as a whole. Therefore, the loss function adopted in our experiment is to add CE loss based on the Dice loss. This can compensate for some deficiencies in the Dice loss (
This section discusses an experiment that was conducted on a dataset of polyps. In order to better verify the effectiveness of our proposed model on the CT images of polyp tumor lesions, we determine the effect on polyp segmentation. We compared popular medical image segmentation semantic segmentation models: UNet (
We randomly selected four test images from different angles and analyzed our model using multiple contrast models. The segmentation results are shown in
Comparison of model effect. The red represents True Positive (TP), indicating that the predicted polyp area is actually a polyp area. Blue represents False Positive (FP), indicating that the predicted polyp area is actually a non-nuclear area. The green represents FN (False Negative), which means that the predicted polyp area is actually a polyp area.
As shown in
Evaluation index of polyp segmentation mIOU,
UNet ( |
0.8883 | 0.9354 | 0.1719 |
LinkNet ( |
0.8711 | 0.9238 | 0.1911 |
U2Net ( |
0.8950 | 0.9398 | 0.1528 |
UNet++ ( |
0.8895 | 0.9364 | 0.1642 |
UNet+++ ( |
0.8831 | 0.9312 | 0.1827 |
PraNet ( |
0.9347 | 0.9612 | 0.1012 |
PspNet ( |
0.8612 | 0.8972 | 0.2453 |
Deeplabv3+ ( |
0.8452 | 0.8872 | 0.3214 |
FCN8 ( |
0.8563 | 0.8945 | 0.2752 |
DnlNet ( |
0.8657 | 0.9143 | 0.2064 |
OcrNet ( |
0.8801 | 0.9210 | 0.1953 |
PointRend ( |
0.8585 | 0.9074 | 0.2153 |
MBFFNet | 0.8952 | 0.9450 | 0.1602 |
To better verify whether our model reduces the redundancy of the feature map and the number of parameters and flops of the model, we calculated the number of parameters and flops of the MBFFNet and LinkNet (
The number of parameters of the model mainly depends on the number of calculations of each convolution kernel in each convolution layer. Here, the size of each convolution kernel is k
The computation of the model is the sum of each convolution layer. The number of calculations of the convolutional layer is determined by the number of calculations of the convolutional kernel in each sliding window and the overall sliding duration. In each sliding window, the number of calculations of the convolution operation is approximately
Using the above formula, the number of parameters in the MBFFNet and the comparison model with flops are shown in
Analysis of the number of parameters and the number of calculation.
UNet ( |
12 | 24.89 | 56.33 |
LinkNet ( |
3 | 11.53 | 1.23 |
U2Net ( |
18 | 96.25 | 40.24 |
UNet++ ( |
20.5 | 36.16 | 135.24 |
UNet+++ ( |
16 | 18.27 | 211.09 |
PraNet ( |
13 | 16.16 | 20.37 |
PspNet ( |
11.5 | 15.11 | 25.57 |
Deeplabv3+ ( |
16.5 | 134.27 | 27.78 |
FCN8 ( |
78.5 | 30.34 | 6390 |
DnlNet ( |
15.5 | 50.13 | 50110 |
OcrNet ( |
5 | 70.35 | 40530 |
PointRend ( |
37.5 | 47.69 | 14640 |
MBFFNet | 5.5 | 23.74 | 15.09 |
To obtain a more intuitive understanding of the effects of different models, we used the flop count as the abscissa and mIOU as the ordinate, and we built a coordinate graph with the number of parameters to show the size of the model, as shown in
Comparison between the accuracy of different models and flop count.
To verify that the detection rate of our model is improved when the number of parameters and number of calculations are significantly decreased, images with sizes of 256 × 256 and 64 × 64 are selected for experiments, and it is determined whether the model can meet the application standards in different computing resource environments. According to the sales data, we choose mainstream graphics cards currently on the market. GTX1060 represents the graphics card having a midrange productivity, which is the one with the highest production and the widest coverage at present. The 2060s is the midrange and top end graphics card and is the one expected to be most in use in the next 20 years. To meet the requirements of our model, it can be used in a wider range of medical environments worldwide to effectively prevent colorectal cancer and accurately separate polyps and adenomas. To test the actual operation effect of MBFFNet and considering the equipment environment in economically underdeveloped areas, we added the R5-3600 with an AMD platform and the I7-8750H CPU environment with an Intel platform, which are commonly used at present. In addition, considering that our proposed model will be applied on a large scale in medical environments, we did not choose traditional segmentation networks with poor segmentation results, such as Deeplabv3+ (
First, we selected a common medical image size of 256 × 256 as a test, and the test results are presented in
256 × 256 polyp image segmentation FPS.
Unet ( |
4 | 3 | 45 | 21 |
LinkNet ( |
19 | 16 | 115 | 88 |
U2Net ( |
2 | 2 | 23 | 14 |
UNet++ ( |
2 | 2 | 22 | 10 |
UNet+++ ( |
2 | 1 | 16 | 8 |
MBFFNet | 8 | 7 | 55 | 28 |
Subsequently, we conducted FPS test experiments on 64 × 64 images, and the experimental results are listed in
FPS segmentation of 64 × 64 polyp images.
Unet ( |
20 | 19 | 152 | 90 |
LinkNet ( |
84 | 68 | 138 | 141 |
U2Net ( |
13 | 14 | 31 | 23 |
UNet++ ( |
10 | 11 | 98 | 55 |
UNet+++ ( |
9 | 9 | 90 | 68 |
MBFFNet | 33 | 31 | 163 | 112 |
Based on the experiment results, it can be seen that owing to the advantages of low flop count, our model displays excellent real-time performance in an environment with low computer resources, while the advantages of our model are very significant in environments with lower computer resources. Under the current computer resources, our model MBFFNet has been able to deal with a variety of different conditions of accurate basic real-time polyp segmentation and achieved a relatively good effect.
For all of the experiments in this section, we chose the same experimental environment and image processing method as the polyp segmentation dataset in
Segmentation effect of liver lesions.
According to the analysis of the experimental results, similar to the results of colonoscopy segmentation, our model is better than PraNet (
Comparison between the accuracy of different models and Flop count.
In this article, an MBFFNet is proposed to achieve the accurate and real-time segmentation of liver lesion images. A U-shaped structure such as UNet is used to gradually fuse shallow features with high-dimensional features. The method of superposition of feature graphs used by UNet is abandoned in the process of feature fusion, but the multiplication of feature graphs is chosen for feature fusion. A feature map with five branches is used, and then a pyramid feature map similar to PspNet is used to fuse the feature as a supplementary feature of the feature information. Finally, the two groups of features are fused to obtain the final segmentation result, and the experimental results show that the algorithm in the segmentation polyp area achieved the same results as the UNet segmentation results regardless of the polyp area size. In addition, it can complete the segmentation edge details such as features, get a better segmentation effect, and significantly reduce the network number and number of calculations, and it improved the real-time performance of the polyp of semantic segmentation model segmentation; at the same time, the segmentation experiments on other medical images show that MBFFNet has good robustness in medical image segmentation.
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.
HS: writing–editing, conceptualization, investigation, and model validation. BL: project administration, writing–editing, and model improvement. XH: writing–original draft and visualization. JL: formal analysis and project improvement. KJ: writing–review. XD: funding acquisition and methodology. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank Tao Liu, Xiaoyan Chen, Jun Li, Jiong Mu, and other teachers for their support and help in our life and contributions. Without their help, we would not have been able to carry out university life and related research so smoothly. We would like to thank Editage (
mIOU evaluation index of multi-class medical image segmentation.
UNet ( |
0.9078 | 0.9691 | 0.8106 | 0.8020 | 0.9854 | 0.9641 |
LinkNet ( |
0.8983 | 0.9166 | 0.7457 | 0.7711 | 0.9803 | 0.9279 |
U2Net ( |
0.9126 | 0.9732 | 0.8070 | 0.8031 | 0.9854 | 0.9609 |
UNet++ ( |
0.9108 | 0.9697 | 0.8083 | 0.8023 | 0.9849 | 0.9733 |
UNet+++ ( |
0.9134 | 0.9715 | 0.8077 | 0.7995 | 0.9858 | 0.9707 |
PraNet ( |
0.9453 | 0.9897 | 0.8762 | 0.8862 | 0.9903 | 0.9801 |
PspNet ( |
0.7892 | 0.9568 | 0.5464 | 0.4891 | 0.9667 | 0.9551 |
Deeplabv3+ ( |
0.7871 | 0.9661 | 0.5449 | 0.4890 | 0.9721 | 0.9623 |
FCN8 ( |
0.9041 | 0.9815 | 0.6687 | 0.7172 | 0.9853 | 0.9645 |
MBFFNet | 0.9132 | 0.9704 | 0.8127 | 0.8061 | 0.9884 | 0.9709 |
Multi-class medical image segmentation
UNet ( |
0.9502 | 0.9842 | 0.8872 | 0.8864 | 0.9926 | 0.9815 |
LinkNet ( |
0.9446 | 0.9803 | 0.8379 | 0.8657 | 0.9900 | 0.9616 |
U2Net ( |
0.9527 | 0.9864 | 0.8846 | 0.8873 | 0.9926 | 0.9798 |
UNet++ ( |
0.9519 | 0.9845 | 0.8855 | 0.8865 | 0.9923 | 0.9863 |
UNet+++ ( |
0.9532 | 0.9855 | 0.8851 | 0.8846 | 0.9928 | 0.9850 |
PraNet ( |
0.9732 | 0.9912 | 0.9213 | 0.9274 | 0.9912 | 0.9883 |
PspNet ( |
0.8729 | 0.9778 | 0.6350 | 0.6223 | 0.9828 | 0.9712 |
Deeplabv3+ ( |
0.8714 | 0.9827 | 0.6337 | 0.6170 | 0.9857 | 0.9653 |
FCN8 ( |
0.9478 | 0.9906 | 0.7722 | 0.8278 | 0.9925 | 0.9671 |
MBFFNet | 0.9604 | 0.9839 | 0.8895 | 0.8928 | 0.9926 | 0.9851 |
Dice loss with CE evaluation index for multi-class medical image segmentation.
UNet ( |
0.1264 | 0.0548 | 0.2222 | 0.3310 | 0.0146 | 0.0377 |
LinkNet ( |
0.1423 | 0.0714 | 0.3258 | 0.3822 | 0.0215 | 0.0779 |
U2Net ( |
0.1191 | 0.0471 | 0.2344 | 0.3327 | 0.0148 | 0.0411 |
UNet++ ( |
0.1213 | 0.0547 | 0.2248 | 0.3202 | 0.0151 | 0.0276 |
UNet+++ ( |
0.1199 | 0.0514 | 0.2351 | 0.3331 | 0.0144 | 0.0307 |
PraNet ( |
0.0921 | 0.0321 | 0.1453 | 0.2145 | 0.0101 | 0.0219 |
PspNet ( |
0.3046 | 0.0743 | 0.6231 | 0.8119 | 0.0351 | 0.0801 |
Deeplabv3+ ( |
0.3068 | 0.0565 | 0.6285 | 0.8169 | 0.0290 | 0.0792 |
FCN8 ( |
0.1318 | 0.0350 | 0.4485 | 0.4874 | 0.0145 | 0.0407 |
MBFFNet | 0.1303 | 0.0638 | 0.2545 | 0.3254 | 0.0130 | 0.0303 |