UGLS: an uncertainty guided deep learning strategy for accurate image segmentation

Accurate image segmentation plays a crucial role in computer vision and medical image analysis. In this study, we developed a novel uncertainty guided deep learning strategy (UGLS) to enhance the performance of an existing neural network (i.e., U-Net) in segmenting multiple objects of interest from images with varying modalities. In the developed UGLS, a boundary uncertainty map was introduced for each object based on its coarse segmentation (obtained by the U-Net) and then combined with input images for the fine segmentation of the objects. We validated the developed method by segmenting optic cup (OC) regions from color fundus images and left and right lung regions from Xray images. Experiments on public fundus and Xray image datasets showed that the developed method achieved a average Dice Score (DS) of 0.8791 and a sensitivity (SEN) of 0.8858 for the OC segmentation, and 0.9605, 0.9607, 0.9621, and 0.9668 for the left and right lung segmentation, respectively. Our method significantly improved the segmentation performance of the U-Net, making it comparable or superior to five sophisticated networks (i.e., AU-Net, BiO-Net, AS-Net, Swin-Unet, and TransUNet).


Introduction
Image segmentation is an important research direction of computer vision and medical image analysis, and widely used as a preprocessing step for various object detection and disease diagnosis (Khened et al., 2018;Jun et al., 2020).It can divide an image into several disjoint regions by performing a pixel-level classification and largely simplify the assessment of morphological and positional characteristics of object regions (Wang L. et al., 2022;Li et al., 2022).To accurately segment images, a number of image segmentation algorithms have been developed for many different applications, such as threshold based methods (Pare et al., 2019;Shahamat and Saniee Abadeh, 2020), active contour based methods (Han and Graphics, 2006), and random field based methods (Poggi and Ragozini, 1999;Hossain and Reza, 2017).Among these methods, deep learning based methods (Ronneberger et al., 2015;Wang Y. et al., 2022) have gained considerable popularity in the past decade because they can obtain remarkable segmentation performances comparable to manual annotations.Moreover, they are able to automatically extract and flexibly integrate different types of feature information by learning the intrinsic laws and representation levels of images to be segmented.Despite promising performances, deep learning based methods are often faced with two key challenges in image segmentation (Wang et al., 2021c;Zheng et al., 2022), one is how to obtain rich local information, the other is how to robustly extract high-level semantics.Given the large number of parameters in deep learning networks, the spatial resolution of images generally decreases with the increase of network depth in order to speed up the learning of feature information.This resolution decrease can bring about the loss of local information, but the increase of network depth is beneficial to the acquisition of global semantic and context information.To mitigate these two challenges, different deep learning networks (Gawlikowski et al., 2023;Seoni et al., 2023) have been constantly emerging to accurately segment images with varying modalities.Alom et al. (Alom et al., 2019) put forward the RU-Net and R2U-Net, respectively by adding different cyclic convolutional blocks to the U-Net for feature detection and accumulation.Seo et al. (Seo et al., 2020) proposed a mU-Net model by introducing learnable deconvolution network structures into the U-Net to improve its learning ability at different resolutions and image segmentation performance.Huang et al. (Huang et al., 2020) proposed a U-Net 3+ model that combines high-level semantics with low-level semantics using full-scale jump concatenation to overcome the drawbacks of the U-Net and U-Net++ (Zhou et al., 2018).Cao et al. (Cao et al., 2022) and Chen et al. (Chen et al., 2021) proposed different transformer based networks (i.e., Swin-Unet and TransUNet), respectively for accurate image segmentation.These network models demonstrated reasonable segmentation accuracy as compared to the U-Net, but their network structures were often more complex.This may not be conducive to network construction and training as well as image segmentation.
To avoid the design of complex network structures, we develop an uncertainty guided deep learning strategy (UGLS) in this study based on a existing network (i.e., U-Net) for accurate image segmentation.We first train the U-Net to obtain a coarse segmentation result and then use morphological operations and Gaussian filters to identify a potential boundary region for each target object based on the obtained result.The boundary region has a unique intensity distribution to indicate the probability of each pixel belonging to object boundaries and is termed as the boundary uncertainty map (BUM) of the objects.With boundary uncertainty maps and original input images, we retrain the U-Net for the fine segmentation of target objects and can obtain a better performance, as compared to its coarse segmentation performance.

Scheme overview
Figure 1 shows the entire workflow of the developed deep learning strategy (UGLS) based on a available network (i.e., U-Net) for image segmentation purposes.The UGLS consists of three key steps, namely, the coarse segmentation of target objects, generation of boundary uncertainty maps for each object, and object fine segmentation.The coarse segmentation is used to detect potential object regions and exclude irrelevant background far away from the detected regions.With the coarse segmentation, we can identify the regions where object boundaries are likely to appear and then generate boundary uncertainty maps for these objects, which can largely enhance the information about object boundaries and facilitate the boundary detection.We integrate these uncertainty maps and original input images and feed them into the given network for a more fine segmentation.After performing these three steps, the network can obtain a significantly improved segmentation performance.

Object coarse segmentation
We first trained the U-Net based on the given images and their manual annotations leveraging a plain network training scheme to obtain a relatively coarse segmentation result for desirable objects.This train procedure can be given by: where I and P indicate the input image and its corresponding prediction map, respectively, f(•) denotes the U-Net with the network parameter φ.The prediction map was relatively coarse as compared with manual annotations of objects because the U-Net has a simple network structure and thereby limited potential to handle images with varying qualities.

Boundary uncertainty map
The obtained coarse segmentation results were often different from manual annotations of objects in certain image regions, especially object boundary regions, but they can provide some important position information for desirable objects.To effectively use the position information, we processed the coarse segmentation results leveraging morphological dilation and erosion operations (Fang et al., 2021), leading to two different object regions.Based on the two object regions, we can identify a potential boundary region (PBR) and a background excluded image (BEI) for each target object, which were separately given by where dilation(•) and erosion(•) are the morphological dilation and erosion operations, respectively, SE is a circular structuring element with a radius of r.The PBR is a binary image and marks the region where object boundaries are most likely to appear, while the BEI merely retains the original image information located in the PBR and can reduce the impact of redundant background in image segmentation, as shown in Figure 2. To take fully advantage of edge position information in coarse segmentation results, we smoothed the PBR using a Gaussian filter with a rectangle window of r × r and a standard deviation of r to generate a boundary uncertainty map.The pixels in the uncertainty map took larger values when they were close to the center of the PBR and reduced ones when far away from this center.Moreover, A larger value generally means a higher probability that a pixel in the uncertainty map belongs to object boundaries.The unique intensity distribution made the boundary uncertainty map able to provide more relevant position information about object boundaries, as compared to the PBR.

Object fine segmentation
After obtaining the boundary uncertainty map and background excluded image, we concatenated these two types of images and fed them into the segmentation network.Since the concatenated images were different from the original images and contained very little background information, the segmentation network can easily detect object boundaries and thereby extract the whole object regions accurately using a simple experiment configuration.Specifically, we implemented the fine segmentation of desirable objects using the same configuration as their coarse segmentation (e.g., the cost function, optimizer and batch size).

Experiment datasets
To validate the developed learning strategy, we performed a series of segmentation experiments on two public dataset, as shown in Figure 3.The first dataset was from the Retinal Fundus Glaucoma Challenge (REFUGE) (Orlando et al., 2020) and contained 1,200 retinal fundus images acquired by two different cameras, together with manual annotations for the optic disc (OD) and cup (OC) regions.These images and their annotations were evenly split into three subsets for training (n = 400), validation (n = 400) and testing (n = 400) purposes, respectively, in the REFUGE challenge, which were also used in this study for segmentation purposes.We normalized these images to reduce the influence of light exposure and cameras and then extracted local disc patches using the dimensions that approximated three times the radius of the OD regions (Wang et al., 2021b).The extracted patches were then resized to 256 × 256 pixels and fed into the U-Net for network training.The second dataset was from a tuberculosis screening program in Montgomery County (TSMC) (Jaeger et al., 2014) and contained 138 chest Xray images acquired using a Eureka stationary Xray machine.Among these Xray images, 80 were normal and 58 were abnormal with manifestations of tuberculosis.All images were deidentified and had a dimension of either 4,020 × 4,892 or 4,892 × 4,020 pixels.The left and right lungs depicted on these Xray images were manually annotated by a radiologist.We also split these Xray images equally into three disjoint subsets for network training (n = 46), validation (n = 46) and testing (n = 46), and resized them to the same dimension of 256 × 256 pixels.

Performance evaluation
We assessed the performance of the UGLS based on the U-Net (short for the developed method, https://github.com/wmuLei/ODsegmentation) on a 64-bit Windows 10 PC with 2.20 GHz 2. 19 GHz Intel(R) Xeon(R) Gold 5120 CPU, 64 GB RAM and NVIDIA GeForce GTX 2080Ti by segmenting 1) the OC region from color fundus images and 2) the left and right lungs from the Xray images, where the r was set to 25 and 35, respectively for these two datasets.We used the Dice Score (DS) (Shi et al., 2022) as the cost function to assess the similarity between the segmentation results and their corresponding manual annotations for each object: where DS k denotes the DS for object k, and K is the total number of objects of interest.p k,i and y k,i are the output probabilities of a specific input image obtained by the U-Net and manual annotation, respectively for pixel i and object k, Ω denotes the entire image domain.We used the RMSprop optimizer to maximize the cost function and set its initial learning rate to 0.001, along with a batch size of eight and an epoch number of 100.To reduce the network training time, we halted the entire training procedure when the performance of the U-Net did not increase for 20 consecutive epochs.In addition, we randomly augmented input images during network training using some transformations, such as horizontal/vertical flip, scaling from 0.9 to 1.1, translation by −10 to 10 percent per axis, rotation from −180 to 180 in degree, and shearing from −5 to 5 in degree.After training, we binarized the prediction map of the U-Net using a given threshold of 0.5 to obtain desirable output results.With these output results, we evaluated our developed method using the DS, Matthew's correlation coefficient (MCC) (Zhu, 2020), sensitivity (SEN) (Wang et al., 2019), and Hausdorff distance (HSD, in pixel).

MCC
TpTn − FpFn Tp + Fp Tp + Fn Tn + Fp Tn + Fn ( ) (5) where T p , F p , T n and F n denote the true positive, false positive, true negative and false negative, respectively.d(X, Y) max x∈X min y∈Y |x − y| is the directed HSD from point set X to Y.The larger the DS, MCC and SEN are and the smaller the HSD is, the better the segmentation performance of the network is.To show the advantage of the UGLS, we compared the developed method with the Attention U-Net (AU-Net) (Oktay et al., 2018), BiO-Net (Xiang et al., 2020), asymmetric U-Net (AS-Net) (Wang et al., 2021b), Swin-Unet (in tiny scale version), and TransUNet.Among these networks, U-Net and its variants (i.e., AU-Net, BiO-Net, AS-Net) shared the similar network architecture (e.g., the number of convolution filters increased from 32 to 1,024) and were trained from scratch based on a given dimension of 256 × 256 pixels and a learning rate of 0.001, while Swin-Unet and TransUNet were trained from initial ImageNet weights based on a dimension of 224 × 224 pixels and a learning rate of 0.01.All these networks were trained six times (by randomly arranging three different subsets for network training, validation and testing, respectively) using the same configurations (except for image dimension and learning rate) for each dataset.The paired t-test was used to evaluate the differences among the involved networks on the DS metric.A p-value less than 0.05 was considered statistically significant (Wang et al., 2021a).

Object coarse segmentation
Tables 1 and 2 summarized six coarse segmentation results of the U-Net with the developed UGLS strategy in extracting the OC from retinal fundus images and the left and right lungs from Xray images, respectively.As demonstrated by the results, the U-Net achieved a relatively low performance in segmenting the OC depicted on fundus images (due to the high similarity between the OD and OC regions), with a average DS, MCC, SEN and HSD of 0.8642, 0.8585, 0.8674 and 2.6420, respectively.In contrast, it obtained a better accuracy for the left and right lungs (with the average DS of 0.9408 and 0.9477, respectively) and can compete with their manual annotations.

Object fine segmentation
Tables 3 and 4 demonstrated the fine segmentation results of the U-Net with the developed UGLS strategy for three different objects depicted on fundus and Xray images, respectively.The U-Net achieved the average DS and SEN of 0.8791 and 0.8858 for the OC region, and 0.9605, 0.9607, 0.9621, and 0.9668 for the left and righ lungs, respectively.As compared with its coarse segmentation results, the U-Net obtained a significantly better overall performance for six different experiments on two types of images with varying modalities (p < 0.01).Specifically, the U-Net had better performances for five fine segmentation experiments for the OC, as compared to its coarse results, as shown in Table 3.Similarly, its performances were also increased in large increments for each experiment in the fine segmentation of the left and right lungs.

Performance comparison
Table 5 summarized the segmentation results of the involved networks (i.e., the U-Net, AU-Net, BiO-Net, AS-Net, Swin-Unet, and TransUNet) in extracting three different objects from fund and Xray images, respectively.As demonstrated by these results, the developed UGLS strategy can significantly improve the performance of the U-Net (p < 0.01) by merely leveraging the its coarse segmentation results in a reasonable way, instead of changing its network structure.Specifically, the average DS of the U-Net increased from 0.8792 to 0.8945 for three different object regions depicted on fundus and Xray images after using our developed deep learning strategy.This strategy made our developed method superior or comparable to the AU-Net (0.8803, p < 0.001), BiO-Net (0.8843, p < 0.005), AS-Net (0.8859, p < 0.005), Swin-Unet (0.8811, p < 0.001), and TransUNet (0.8900, p < 0.05) with all the p-values less than 0.05 for the two segmentation tasks.Figures 4 and 5 showed the performance differences among the involved networks on several fundus and Xray images.boundary uncertainty maps in three different ways.As demonstrated by the results, our developed method obtained the lowest segmentation performance, with the average DS of 0.9437 when merely trained on boundary uncertainty maps, but it had increased performance when combining the uncertainty maps with the original images or their background excluded version for network training (with the average DS of 0.9611 and 0.9613).Moreover, the background excluded images can better improve the performance of our developed method since they reduced the impact of irrelevant background information away from desirable objects.

Effect of parameter r
Table 7 summarized the impact of the parameter r on the performance of the developed method in segmenting three different objects from fundus and Xray images.The developed method achieved the best overall performance when this parameter was set to 25 in the OC segmentation and 35 in the left and right lung segmentation, respectively, for the morphological operations and Gaussian filter.These two parameter values ensured a good balance between object information and irrelevant background for our developed method, making it able to accurately detect object boundaries.Table 8 showed the performance of the developed method when using different values for the parameters in the morphological operations and Gaussian filter.From the table, our developed method obtained a superior overall performance when the morphological operations and Gaussian filter shared the same value for each image dataset, which can effectively highlight the center regions of boundary uncertainty maps, as shown in Figure 6.

Discussion
In this paper, we developed a novel network training strategy (termed UGLS) for accurate image segmentation and assessed its effectiveness based on an existing network (i.e., the U-Net) by extracting three different objects depicted (i.e., the OC, left and right lungs) on fundus and Xray images.In the developed method, the U-Net was first trained using the traditional training strategy on the original images and their manual annotations for the coarse-grained segmentation of desirable objects.The segmentation results were then proposed to locate a potential boundary region for each object, which was combined with the original images for the fine segmentation of the objects.We validated the developed method on two public datasets (i.e., REFUGE and TSMC) and compared it with five available networks (i.e., the AU-Net, BiO-Net, AS-Net, Swin-Unet and TransUNet) under the similar experiment configurations.Extensive experiments showed that the developed method can largely improve the segmentation performance of the U-Net and was comparable or superior to the AU-Net, BiO-Net, AS-Net, Swin-Unet and TransUNet, all of which had much more complex network structures than the U-Net.The developed method achieved promising overall performance in segmenting multiple different objects, as compared to three existing networks.This may be attributed to the following reasons: First, the coarse segmentation of the objects was able to detect various types of image features and provide some important location information for each object and its boundaries.Second, the introduction of boundary uncertainty maps made the potential boundary region have a unique intensity distribution.This distribution largely facilitated the detection of object boundaries and enhanced the sensitivity and accuracy of the U-Net in segmenting objects of interest.Third, the use of background excluded images can not only ensure a reasonable balance between object information and its surrounding background, but also ensure that the U-Net performs the learning of various features in the specified region, thereby leading to a increased segmentation performance and a reduced influence of undesirable background.Due to these reasons, the developed method can significantly improve the segmentation performance of a relatively simple network (i.e., the U-Net) and make it comparable or superior to several existing sophisticated networks.
We further assessed the influence of boundary uncertainty maps and the parameter r on the performance of the developed method.Illustration of the segmentation results of local disc patches (in the first two rows) and their closeup versions (in the last two rows) from eight fundus images obtained by the AU-Net (in green), BiO-Net (in blue), AS-Net (in cyan), Swin-Unet (in black), TransUNet (in orange) and our developed method in coarse (in red) and fine (in magenta) segmentation stages as well as their manual delineations (in white), respectively.
Segmentation results in Tables 6-8 showed that (Eq. 1) the developed method achieved better segmentation performance when trained on the combination of boundary uncertainty maps and the background excluded images, as compared to the counterparts trained merely on boundary uncertainty maps or the original images.This may be due to the fact that there are no enough texture information relative to targe objects and their boundaries in boundary uncertainty maps, but too much background information in the original images, both of which can reduce the learning potential of the U-Net and deteriorate its segmentation performance.2) The developed method obtained relatively high segmentation accuracy when the parameter r was assigned to 25 for the OC segmentation and 35 for the left and right lung segmentation.This parameter controlled the amount of information about desirable objects and their surrounding background in the boundary uncertainty maps.A proper value for the parameter can ensure a good balance between the two types of image information and significantly improve the fine segmentation performance of our developed method.If the parameter value was set too small or large, our developed Illustration of the segmentation results of nine Xray images obtained by the AU-Net (in green), BiO-Net (in blue), AS-Net (in cyan), Swin-Unet (in black), TransUNet (in orange) and our developed method in coarse (in red) and fine (in magenta) segmentation stages as well as their manual delineations (in white), respectively.
Frontiers in Physiology frontiersin.org10 method would have a final result that was very close to its coarse segmentation results or contained lots of undesirable background.3) The parameter r was used simultaneously in morphological operations and Gaussian filter since it can ensure that pixels in the center region of boundary uncertainty map have more high contrast or intensity, as compared to the counterparts in other regions.4) Boundary uncertainty maps can be generated using different strategies, but their corresponding segmentation performance was very similar (i.e., 0.8791 vs. 0.8721 for the OC segmentation), based on our previous study (Zhang et al., 2023).

Conclusion
We developed a uncertainty guided deep learning strategy (UGLS) to improve the performance of existing segmentation neural networks and validated it based on the classical U-Net by segmenting the OC from color fundus images and the left and right lungs from Xray images.The novelty of our developed method lies in the introduction of boundary uncertainty maps and their integration with the input images for accurate image segmentation.Extensive experiments on public fundus and Xray   image datasets demonstrated that the developed method had the potential to effectively extract the OC from fundus images and the left and right lungs from Xray images, largely improved the performance of the U-Net, and can compete with several sophisticated networks (i.e., the AU-Net, BiO-Net, AS-Net, Swin-Net, and TransUNet).

FIGURE 2 (
FIGURE 2 (A-C) are the coarse segmentation result, the PBR and boundary uncertainty map, respectively, (D-F) are the manual annotation of desirable object, the original image and its background excluded version.
FIGURE 3 (A-C) showed a fundus image, its normalized version, and the local disc patch with manual annotations of the OD and OC, respectively, (D) and (E) showed a Xray image and its annotations for the left and right lungs.

FIGURE 5
FIGURE 5 (A) and (B) are the coarse segmentation result of a given fundus image and its corresponding potential boundary region, respectively.(C-E) are the smoothed results of (B) using a Gassian filter with the parameter r of 15, 25, and 35, respectively.
Table6showed the results of the developed method in extracting the left and right lungs from Xray images using

TABLE 1
Results of our proposed method for the coarse segmentation of the OC regions based on six experiments (i.e., Seg1-6) in terms of the mean and standard deviation (SD) of DS,MCC, SEN and HSD (in pixel).The performance of the developed method for segmenting the left and right lungs (LL and RL) from Xray images.

TABLE 3
Fine segmentation results of the developed method for the OC regions in terms of the DS, MCC, SEN and HSD (in pixel) metrics.

TABLE 4
Fine segmentation results of the developed method for segmenting the left and right lungs (LL and RL) from the Xray images in terms of the DS, MCC, SEN and HSD (in pixel) metrics.

TABLE 5
Performance differences among the involved networks in segmenting the OC, left and right lungs depicted on fundus and Xray images, respectively.

TABLE 6
The results of the developed method trained on the boundary uncertainty map (BUM) or its combination with the original image (ORI) or its background excluded version (BEI) for the left and right lung segmentation.

TABLE 7
The results of the developed method on fundus and Xray images by setting different values for parameters r.

TABLE 8
The results of the developed method for the first experiment on fundus and Xray images using different values for parameter r in morphological operations and Gaussian filter (short for r m and r g , respectively).