Difference of anisotropic and isotropic TV for segmentation under blur and Poisson noise

In this paper, we aim to segment an image degraded by blur and Poisson noise. We adopt a smoothing-and-thresholding (SaT) segmentation framework that finds a piecewise-smooth solution, followed by k-means clustering to segment the image. Specifically for the image smoothing step, we replace the least-squares fidelity for Gaussian noise in the Mumford-Shah model with a maximum posterior (MAP) term to deal with Poisson noise and we incorporate the weighted difference of anisotropic and isotropic total variation (AITV) as a regularization to promote the sparsity of image gradients. For such a nonconvex model, we develop a specific splitting scheme and utilize a proximal operator to apply the alternating direction method of multipliers (ADMM). Convergence analysis is provided to validate the efficacy of the ADMM scheme. Numerical experiments on various segmentation scenarios (grayscale/color and multiphase) showcase that our proposed method outperforms a number of segmentation methods, including the original SaT.


Introduction
Image segmentation partitions an image into multiple, coherent regions, where pixels of one region share similar characteristics such as colors, textures, and edges.It remains an important yet challenging problem in computer vision that has various applications, including magnetic resonance imaging [26,40,62] and microscopy [7,81].One of the most fundamental models for image segmentation is the Mumford-Shah model [50] because of its robustness to noise.Given an input image f : Ω → R defined on an open, bounded, and connected domain Ω ⊂ R 2 , the Mumford-Shah model is formulated as where u : Ω → R is a piecewise-smooth approximation of the image f , Γ ⊂ Ω is a compact curve representing the region boundaries, and λ, µ > 0 are the weight parameters.The first term in (1) is the fidelity term that ensures that the solution u approximates the image f .The second term enforces u to be piecewise smooth on Ω \ Γ.The last term measures the perimeter, or more mathematically the one-dimensional Haussdorf measure in R 2 [4], of the curve Γ.However, (1) is difficult to solve because the unknown set of boundaries needs to be discretized.One common approach involves approximating the objective function in (1) by a sequence of elliptic functionals [1].
Alternatively, Chan and Vese (CV) [18] simplified (1) by assuming the solution u to be piecewise constant with two phases or regions, thereby making the model easier to solve via the level-set method [51].Let the level-set function ϕ be Lipschitz continuous and be defined as follows: By the definition of ϕ, the curve Γ is represented by ϕ(x) = 0.The image region can be defined as either inside or outside the curve Γ.In short, the CV model is formulated as min c1,c2,ϕ where λ, ν are weight parameters, the constants c 1 , c 2 are the mean intensity values of the two regions, and H(ϕ) is the Heaviside function defined by H(ϕ) = 1 if ϕ ≥ 0 and H(ϕ) = 0 otherwise.A convex relaxation [17] of (2) was formulated as where an image segmentation ũ is obtained by thresholding u, that is for some value τ ∈ (0, 1).It can be solved efficiently by convex optimization algorithms, such as the alternating direction method of multipliers (ADMM) [6] and primal-dual hybrid gradient [15].A multiphase extension of (2) was proposed in [64], but it requires that the number of regions to be segmented is a power of 2. For segmenting into an arbitrary number of regions, fuzzy membership functions were incorporated [38].
Cai et al. [12] proposed the smoothing-and-thresholding (SaT) framework that is related to the model (1).
In the smoothing step of SaT, a convex variant of (1) is formulated as yielding a piecewise-smooth solution u * .The blurring operator A is included in the case when the image f is blurred.The total variation (TV) term Ω |∇u| dx is a convex approximation of the length term in (2) by the coarea formula [17].After the smoothing step, a thresholding step is applied to the smooth image u * to segment it into multiple regions.The two-stage framework has many advantages.First, the smoothing model ( 3) is strongly convex, so it can be solved by any convex optimization algorithm to obtain a unique solution u * .Second, the user can adjust the number of thresholds to segment u * and the threshold values to obtain a satisfactory segmentation result, thanks to the flexibility of the thresholding step.Furthermore, the SaT framework can be adapted to color images by incorporating an intermediate lifting step [10].Before performing the thresholding step, the lifting step converts the RGB space to Lab (perceived lightness, redgreen and yellow-blue) color space and concatenates both RGB and Lab intensity values into a six-channel image.The multi-stage framework for color image segmentation is called smoothing, lifting, and thresholding (SLaT).
One limitation of (3) lies in the ℓ 2 fidelity term that is statistically designed for images corrupted by additive Gaussian noise, and as a result, the smoothing step is not applicable to other types of noise distribution.In this paper, we aim at Poisson noise, which is commonly encountered when an image is taken by photon-capturing devices such as in positron emission tomography [63] and astronomical imaging [36].
In this paper, we propose an AITV variant of (4) to improve the smoothing step of the SaT/SLaT framework for images degraded by Poisson noise and/or blur.Incorporating AITV regularization is motivated by our previous works [8,9,53], where we demonstrated that AITV regularization is effective in preserving edges and details, especially under Gaussian and impulsive noise.To maintain similar computational efficiency as the original SaT/SLaT framework, we propose an ADMM algorithm that utilizes the ℓ 1 − αℓ 2 proximal operator [46].The main contributions of this paper are as follows: • We propose an AITV-regularized variant of (4) and prove the existence of a minimizer for the model.
• We develop a computationally efficient ADMM algorithm and provide its convergence analysis under certain conditions.
• We conduct numerical experiments on various grayscale/color images to demonstrate the effectiveness of the proposed approach.
The rest of the paper is organized as follows.Section 2 describes the background information such as notations, Poisson noise, and the SaT/SLaT framework.In Section 3, we propose a simplified Mumford-Shah model with AITV and a MAP data fidelity term for Poisson noise.In the same section, we show that the model has a global minimizer and develop an ADMM algorithm with convergence analysis.In Section 4, we evaluate the performance of the AITV Poisson SaT/SLaT framework on various grayscale and color images.
Lastly, we conclude the paper in Section 5.

Notation
Throughout the rest of the paper, we represent images and mathematical models in discrete notations (i.e., vectors and matrices).An image is represented as an M × N matrix, and hence the image domain is denoted by Ω = {1, 2, . . ., M } × {1, 2, . . ., N }.We define two inner product spaces: Let u ∈ X.For shorthand notation, we define u ≥ 0 if u i,j ≥ 0 for all (i, j) ∈ Ω.The discrete gradient operator ∇ : X → Y is defined by (∇u) i,j = ((∇ x u) i,j , (∇ y u) i,j ), where and The space X is equipped with the standard inner product ⟨•, •⟩ X and Euclidean norm ∥ • ∥ 2 .The space Y has the following inner product and norms: for p = (p 1 , p 2 ) ∈ Y and q = (q 1 , q 2 ) ∈ Y , For brevity, we omit the subscript X or Y in the inner product when its context is clear.

AITV Regularization
There are two popular discretizations of total variation: the isotropic TV [58] and the anisotropic TV [21], which are defined by respectively.This work is based on the weighted difference between anisotropic and isotropic TV (AITV) regularization [48], defined by for a weighting parameter α ∈ [0, 1].The range of α ensures the non-negativity of the AITV regularization.
Note that anisotropic TV is defined as the ℓ 1 norm of the image gradient ((∇ x u) i,j , (∇ y u) i,j ) at the pixel location (i, j) ∈ Ω, while isotropic TV is the ℓ 2 norm on the gradient vector.As a result, AITV can be viewed as the ℓ 1 − αℓ 2 regularization on the gradient vector at every pixel, thereby enforcing sparsity individually at each gradient vector.

Poisson Noise
Poisson noise follows the Poisson distribution with mean and variance η, whose probability mass function is given by For a clean image g ∈ X, its intensity value at each pixel g i,j serves as the mean and variance for the corresponding noisy observation f ∈ X defined by To recover the image g from the noisy image f , we find its maximum a posteriori (MAP) estimation u, which maximizes the probability P(u|f ).By Bayes' theorem, we have .
It further follows from the definition (6) that P(f i,j |u i,j )P(u i,j ) = P ui,j (f i,j )P(u i,j ) = e −ui,j u fi,j i,j (f i,j )! P(u i,j ).
Since Poisson noise is i.i.d.pixelwise, we have .
The MAP estimate of P(u|f ) is equivalent to its negative logarithm, thus leading to the following optimization problem: The last term − log P(u i,j ) can be regarded as an image prior or a regularization.For example, Le et al. [37] considered the isotropic total variation as the image prior and proposed a Poisson denoising model where log is applied pixelwise and 1 is the matrix whose entries are all 1's.The first term in (8) is a concise notation that is commonly used as a fidelity term for Poisson denoising in various imaging applications [16,19,22,23,37,70].

Review of Poisson SaT/SLaT
A Poisson SaT framework [16] consists of two steps.Given a noisy grayscale image f ∈ X corrupted by Poisson noise, the first step is the smoothing step that finds a piecewise-smooth solution u * from the optimization model: Then in the thresholding step, K − 1 threshold values τ 1 ≤ τ 2 ≤ . . .≤ τ K−1 are appropriately chosen to segment u * into K regions, where the kth region is given by with τ 0 := inf x∈Ω u * (x).The thresholding step is typically performed by k-means clustering.
The Poisson smoothing, lifting, and thresholding (SLaT) framework [10] extends the Poisson SaT framework to color images.For a color image ).An additional lifting step [49] 3 ) in the Lab space (perceived lightness, red-green, and yellow-blue).The channels in Lab space are less correlated than in RGB space, so they may have useful information for segmentation.The RGB image and the Lab image are concatenated to form the multichannel , followed by the thresholding stage.Generally, k-means clustering yields K centroids c 1 , . . ., c K as constant vectors, which are used to form the region After the thresholding step for both SaT/SLaT, we define a piecewise-constant approximation of the image f by f = ( f1 , . . ., fd ) such that fℓ = where c k,ℓ is the ℓth entry of the constant vector c k and Recall that d = 1 when f is grayscale, and d = 3 when f is color.

Proposed Approach
To improve the Poisson SaT/SLaT framework, we propose to replace the isotropic TV in ( 9) with AITV regularization.In other words, in the smoothing step, we obtain the smoothed image u * from the optimization • blurring operator A • fidelity parameter λ > 0 • smoothing parameter µ ≥ 0 • the number of regions in the image K k=1 and compute f by (10). problem for α ∈ [0, 1].We establish that this model admits a global solution.We then develop an ADMM algorithm to find a solution and provide the convergence analysis.The overall segmentation approach is described in Algorithm 1.

Model Analysis
To establish the solution's existence of the proposed model ( 11), we start with Lemma 1, a discrete version of Poincaré's inequality [27].In addition, we prove Lemma 2 and Proposition 3, leading to the global existence theorem (Theorem 4).
Lemma 1.There exists a constant C > 0 such that for every u ∈ X and ū : Proof.We prove it by contradiction.Suppose there exists a sequence {u k } ∞ k=1 such that where ūk For every k, we normalize each element in the sequence by By ( 13), we have As 15) that ∥∇v * ∥ 2,1 = 0. Since ker(∇) = {c1 : c ∈ R}, then v * is a constant vector.However, (14) implies that v * = 0 and ∥v * ∥ 2 = 1.This contradiction proves the lemma.
Lemma 2. Suppose ∥f ∥ ∞ < ∞ and min i,j f i,j > 0. There exists a scalar u 0 > 0 such that we have 2(x − f i,j log x) ≥ x for any x ≥ u 0 and (i, j) ∈ Ω.
Proof.For each (i, j) ∈ Ω, we want to show that there exists u i,j > 0 such that H(x) := x − 2f i,j log x ≥ 0 for x ≥ u i,j .Since H(x) is strictly convex and it attains a global minimum at x = 2f i,j , it is increasing on the domain x > 2f i,j .Additionally as x dominates log(x) as x → +∞, there exists u i,j > 2f i,j > 0 such that ui,j log ui,j ≥ 2f i,j , which implies that H(u i,j ) = u i,j − 2f i,j log u i,j ≥ 0. As a result, for x ≥ u i,j > 2f i,j , we obtain x − 2f i,j log x = H(x) ≥ H(u i,j ) ≥ 0. Define u 0 := max i,j u i,j , and hence we have 2 Proof.Since ker(A) ∩ ker(∇) = {0}, we have A1 ̸ = 0. Simple calculations lead to where the last inequality is due to Lemma 1.The boundedness of {Au k } ∞ k=1 and {∇u k } ∞ k=1 implies that {ū k } ∞ k=1 is also bounded by (16).We apply Lemma 1 to obtain Finally, we adapt the proof in [16] to establish that F has a global minimizer.), and ker(A) ∩ ker(∇) = {0}, then F has a global minimizer.
Proof.It is straightforward that ∥∇u∥ 2,1 ≤ ∥∇u∥ 1 , thus ∥∇u∥ 1 − α∥∇u∥ 2,1 ≥ 0 for α ∈ [0, 1).As a result, we have Given a scalar f > 0, the function G(x) = x − f log(x) attains its global minimum at x = f.Therefore, we have x − f i,j log x ≥ f i,j − f i,j log f i,j for all x > 0 and (i, j) ∈ Ω, which leads to a lower bound of F (u), i.e., As F (u) is lower bounded by F 0 , we can choose a minimizing sequence {u k } ∞ k=1 and hence F (u k ) has a uniform upper bound, denoted by B 1 , i.e., F (u k ) < B 1 for all k ∈ N. It further follows from (17) that Using these uniform bounds, we derive that As α < 1, the sequence {∇u k } ∞ k=1 is bounded.To prove the boundedness of {Au k } ∞ k=1 , we introduce the notations of x + = max(x, 0) and x − = − min(x, 0) for any x ∈ R. Then x = x + − x − .By Lemma 2, there exists u 0 > 0 such that 2(x − f i,j log x) ≥ x, ∀x ≥ u 0 and (i, j) ∈ Ω.We observe that This shows that is bounded due to Proposition 3. Therefore, there exists a subsequence {u kn } ∞ n=1 that converges to some u * ∈ X.As F is continuous and thus lower semicontinuous, we have which means that u * minimizes F .

Numerical Algorithm
To minimize (11), we introduce two auxiliary variables v ∈ X and w = (w x , w y ) ∈ Y , leading to an equivalent constrained optimization problem: The corresponding augmented Lagrangian is expressed as where y ∈ X and z = (z x , z y ) ∈ Y are Lagrange multipliers and β 1 , β 2 are positive parameters.We then apply the alternating direction method of multipliers (ADMM) to minimize (19) that consists of the following steps per iteration k: where σ > 1.
Remark 1.The scheme presented in (21) slightly differs from the original ADMM [6], the latter of which has σ = 1 in (21f).Having σ > 1 increases the weights of the penalty parameters β 1,k , β 2,k in each iteration k, thus accelerating the numerical convergence speed of the proposed ADMM algorithm.A similar technique has been used in [14,30,60,61,79].
All the subproblems (21a)-(21c) have closed-form solutions.In particular, the first-order optimality condition for (21a) is where positive definite and thereby invertible, which implies that (22) has a unique solution u k+1 .By assuming periodic boundary condition for u, the operators ∆ and A ⊤ A are block circulant [68], and hence ( 22) can be solved efficiently by the 2D discrete Fourier transform F. Specifically, we have the formula where F −1 is the inverse discrete Fourier transform, the superscript * denotes complex conjugate, the operation • is componentwise multiplication, and division is componentwise.By differentiating the objective function of (21b) and setting it to zero, we can get a closed-form solution for v k+1 given by where the square root, squaring, and division are performed componentwise.Lastly, the w-subproblem (21c) can be decomposed componentwise as follows: (w i,j ) k+1 = arg min wi,j where the proximal operator for ℓ 1 − αℓ 2 on x ∈ R n is given by prox(x, α, β) = arg min The proximal operator for ℓ 1 − αℓ 2 has a closed form solution summarized by Lemma 5.

When
) and the remaining elements equal to 0.

When ∥x∥
In summary, we describe the ADMM scheme to solve (11) in Algorithm 2.

Convergence Analysis
We establish the subsequential convergence of ADMM described in Algorithm 2. The global convergence of ADMM [69] is inapplicable to our model as the gradient operator ∇ is non-surjective, which will be further investigated in future work.For the sake of brevity, we set β = β 1 = β 2 and denote L β (u, v, w, y, z) := L β,β (u, v, w, y, z).
In addition, we introduce definitions of subdifferentials [57], which defines a stationary point of a non-smooth objective function.
The optimality conditions at iteration k n are the following: Expanding (43a) by substituting in (21d)-(21e) and taking the limit, we have Substituting in (21d) into (43b) and taking the limit give us Lastly, by substituting (21e) into (43c), we have By continuity, we have Together with the fact that ) by closedness of the subdifferential.
Remark 2. It is true that the assumptions in Theorem 9 are rather strong, but they are standard in the convergence analyses of other ADMM algorithms for nonconvex problems that fail to satisfy the conditions for global convergence in [69].For example, [34,35,39,43] assumed convergence of the successive differences of the primal variables and Lagrange multipliers.Instead, we modify the convergence of the successive difference

Grayscale, Multiphase Segmentation
We examine the multiphase segmentation on the entire BrainWeb dataset [3]  Across all 20 images of the BrainWeb dataset, Table 2 reports the average DICE indices for CSF, GM, and WM and average computational times in seconds of the segmentation methods.For both cases (1) and (2), AITV SaT attains the highest average DICE indices for segmenting CSF, GM, and WM.AITV SaT is comparable to TV SaT and T-ROF in terms of computational time.
Figure 4 shows the segmentation results of the first image in Figure 3     • The smoothing parameter µ determines how smooth the solution u * should be.A larger value of µ may improve denoising, but at a cost of smearing out the edges between adjacent regions, which may be segmented together if they have similar colors.
• The sparsity parameter α ∈ [0, 1] determines how sparse the gradient vector at each pixel should be.
We perform sensitivity analysis on these parameters to understand how they affect the segmentation quality of AITV SaT/SLaT.We consider two types of tests in the case of P/8 with motion blur in Figure 3.In the first case (Figure 8), we fix µ = 1.0 and vary α, λ.In the second case (Figure 9), we fix λ = 5.0 and vary α, µ. Figure 8 reveals a concave relationship of the DICE index of each region with respect to the parameter λ, which implies there exists the optimal choice of λ.Additionally, when λ is small, a large value for α can improve the DICE indices.According to Figure 9, the DICE indices of the GM and WM regions decrease with respect to µ, while the DICE index of the CSF region is approximately constant.For α = 0.8, the DICE indices of the GM and WM regions are the largest when µ ≥ 1, but the large α is not optimal for CSF.Hence, an intermediate value of α, such as 0.6, is preferable to attain satisfactory segmentation quality for all three regions.Lastly, in Figure 10, we conduct sensitive analysis for the case of P = 10 of Figure 5.We fix µ = 0.05, while varying λ, α in Figure 10(A), which indicates that the optimal value for α is in the range of 0.5 ≤ α ≤ 0.7.

Figure 1 :
Figure 1: The entire DRIVE dataset [59] for binary segmentation.The image size is 584×565 with background value of 200 and the pixel value for vessels to be 255.

4. 1
Grayscale, Binary SegmentationWe start with performing binary segmentation on the entire DRIVE dataset[59] that consists of 20 images shown in Figure1.Each image has size 584 × 565 with modified pixel values of either 200 for the background or 255 for the vessels.Before adding Poisson noise, we set the peak value of the image to be P/2 or P/5, where P = 255.Note that a lower peak value indicates stronger noise in the image, thus more challenging for denoising.We examine three cases: (1) P/2 no blur, (2) P/5 no blur, and (3) P/2 with Gaussian blur specified by MATLAB command fspecial('gaussian', [10 10], 2).For the TV SaT method, we set λ = 14.5, µ = 0.5 for case (1), λ = 8.0, µ = 0.5 for case(2), and λ = 22.5, µ = 0.25 for case(3).For the AITV SaT method, the parameters λ and µ are set the same as TV SaT, and we have α = 0.3 for cases (1)-(2) and α = 0.8 for case (3).

Figure 2 :
Figure 2: Binary segmentation results of Figure 1 with peak P/2 under Gaussian blur and Poisson noise.

Figure 4 :
Figure4shows the segmentation results of the first image in Figure3for case(1).When segmenting CSF, the methods (TV SaT, AITV SaT, and Storath) yield similar visual results, while Pock fails to segment roughly half of the region.In addition, AITV SaT segments the most GM region with the least amount of

Figure 8 :
Figure 8: Sensitivity analysis on λ for the P/8 with motion blur case of Figure 3.The parameter µ = 1.0 is fixed.DICE indices averaged over 10 images for each brain region are plotted with respect to λ.

Figure 9 :
Figure 9: Sensitivity analysis on µ for the P/8 with motion blur case of Figure 3.The parameter λ = 5.0 is fixed.DICE indices averaged over 10 images for each brain region are plotted with respect to µ.

Figure 10 :
Figure 10: Sensitivity analysis of parameters for P = 10 case of Figure 5. (A) is the sensitivity analysis on λ when µ = 0.05 fixed; (B) is the sensitivity analysis on µ when λ = 1.5 fixed.Average PSNR is plotted.

Table 1
records the DICE indices and the computational time in seconds for the competing methods, averaged over 20 images.We observe that AITV SaT attains the best DICE indices for all three cases with comparable computational time to TV SaT and T-ROF, all of which are much faster than Pock and Storath.As visually illustrated in Figure2, AITV SaT segments more of the thinner vessels compared to TV SaT and T-ROF in five images, thereby having the higher average DICE indices.

Table 1 :
DICE and computational time in seconds of the binary segmentation methods averaged over 20 images in Figure1with standard deviations in parentheses.Bold indicates best result.

Table 2 :
DICE and computational time in seconds of the multiphase segmentation methods averaged over 20 images in Figure3with standard deviations in parentheses.Bold indicates best result.

Table 3 :
PSNR and computation time in seconds for color image segmentation of the images in Figure5.