A New Nonconvex Sparse Recovery Method for Compressive Sensing

As an extension of the widely used ℓ r -minimization with 0 < r ≤ 1, a new non-convex weighted ℓ r − ℓ 1 minimization method is proposed for compressive sensing. The theoretical recovery results based on restricted isometry property and q -ratio constrained minimal singular values are established. An algorithm that integrates the iteratively reweighted least squares algorithm and the difference of convex functions algorithm is given to approximately solve this non-convex problem. Numerical experiments are presented to illustrate our results.


INTRODUCTION
Compressive sensing (CS) has attracted a great deal of interests since its advent [1,2], see the monographs [3,4] and the references therein for a comprehensive view. Basically, the goal of CS is to recover an unknown (approximately) sparse signal x ∈ R N from the noisy underdetermined linear measurements with m ≪ N, A ∈ R m×N being the pre-given measurement matrix and e ∈ R m being the noise vector. If the measurement matrix A satisfies some kinds of incoherence conditions (e.g., mutual coherence condition [5,6], restricted isometry property (RIP) [7,8], null space property (NSP) [9,10], or constrained minimal singular values (CMSV) [11,12]), then stable (w.r.t. sparsity defect) and robust (w.r.t. measurement error) recovery results can be guaranteed by using the constrained ℓ 1 -minimization [13]: Here the ℓ 1 -minimization problem works as a convex relaxation of ℓ 0 -minimization problem, which is NP-hard to solve [14]. Meanwhile, non-convex recovery algorithms such as the ℓ r -minimization (0 < r < 1) have been proposed to enhance sparsity [15][16][17][18][19][20]. ℓ r -minimization enables one to reconstruct the sparse signal from fewer number of measurements compared to the convex ℓ 1 -minimization, although it is more challenging to solve because of its non-convexity. Fortunately, an iteratively reweighted least squares (IRLS) algorithm can be applied to approximately solve this non-convex problem in practice [21,22].
As an extension of the ℓ r -minimization, we study in this paper the following weighted ℓ r − ℓ 1 minimization problem for sparse signal recovery: min z∈R N z r r − α z r 1 subject to Az − y 2 ≤ η, where y = Ax + e with e 2 ≤ η, 0 ≤ α ≤ 1, and 0 < r ≤ 1.
Throughout the paper, we assume that α = 1 when r = 1.
Obviously, it reduces to the traditional ℓ r -minimization problem when α = 0. This hybrid norm model is inspired by the nonconvex Lipschitz continuous ℓ 1 − ℓ 2 model (minimizing the difference of ℓ 1 norm and ℓ 2 norm) proposed in Lou et al. [23] and Yin et al. [24], which improves the ℓ 1 -minimization in a robust manner, especially for the highly coherent measurement matrices. Roughly speaking, the underlying logic of adopting these kinds of norm differences or the ratios of norms [25] comes from the fact that they can be viewed as sparsity measures, see the effective sparsity measure called q-ratio sparsity (involving the ratio of ℓ 1 norm and ℓ q norm) defined later in Definition 2 of section 2.2. Other recent related literatures include [26][27][28][29], to name a few.
To illustrate these weighted ℓ r − ℓ 1 norms ( · r r − α · r 1 ), we present their corresponding contour plots in Figure 1 1 . As is shown, different non-convex patterns arise while varying the difference weight α or the norm order r. And the level curves of weighted ℓ r − ℓ 1 norms approach the x and y axes as the norm values get small, which reflects their ability to promote sparsity. In the present paper, we shall focus on both the theoretical aspects and the computational study for this non-convex sparse recovery method. This paper is organized as follows. In section 2, we derive the theoretical performance bounds for the weighted ℓ r − ℓ 1 minimization based on both r-RIP and q-ratio CMSV. In section 3, we give an algorithm to approximately solve the unconstrained version of the weighted ℓ r −ℓ 1 minimization problem. Numerical experiments are provided in section 4. Section 5 concludes with a brief summary and an outlook on future extensions.

RECOVERY ANALYSIS
In this section, we establish the theoretical performance bounds for the reconstruction error of the weighted ℓ r − ℓ 1 minimization problem, based on both r-RIP and q-ratio CMSV. Hereafter, we say a signal x ∈ R N is s-sparse if x 0 = N i=1 1{x i = 0} ≤ s, and denote by x S the vector that coincides with x on the indices in S ⊆ [N] : = {1, 2, · · · , N} and takes zero outside S.

r-RIP
We start with the definition of the s-th r-restricted isometry constant, which was introduced in Chartrand and Staneva [30].
Definition 1. ( [30]) For integer s > 0 and 0 < r ≤ 1, the sth r-restricted isometry constant (RIC) δ s = δ s (A) of a matrix A ∈ R m×N is defined as the smallest δ ≥ 0 such that for all s-sparse vectors x ∈ R N .
Then, the r-RIP means that the s-th r-RIC δ s is small for reasonably large s. In Chartrand and Staneva [30], the authors established the recovery analysis result for ℓ r -minimization problem based on this r-RIP. To extend this to the weighted ℓ r − ℓ 1 minimization problem, the following lemma plays a crucial role.
In particular, when S = supp(x) ⊆ [N] and |S| = s, then Proof. The right hand side of (5) follows immediately from the norm inequality x r ≤ N 1/r−1 x 1 for any x ∈ R N and 0 < r ≤ By denoting a j = |x j | min i∈[N] |x i | , we have a j ≥ 1 for any 1 ≤ j ≤ N, and to show (7) it suffices to show Assume the function f (a 1 , a 2 , · · · , a N ) = Then, as a result of we have f (a 1 , a 2 , · · · , a N ) ≥ f (1, 1, · · · , 1) = N − αN r . Thus, the left hand side of (5) holds and the proof is completed. (6) follows as we apply (5) to x S . Now, we are ready to present the r-RIP based bound for the ℓ 2 norm of the reconstruction error.

Theorem 1. Let the ℓ r -error of best s-term approximation of x be
and suppose the measurement matrix A satisfies the condition then any solutionx to the minimization problem (3) obeys

Proof.
We assume that S is the index set that contains the largest s absolute entries of x so that σ s (x) r = x S c r and let h =x − x.
Then we have Using the Holder's inequality, we obtain By Ax − y 2 = e 2 ≤ η and the triangular inequality, Thus, Arrange S c = S 1 ∪ S 2 ∪ · · · , where S 1 is the index set of M = as largest absolute entries of h in S c , S 2 is the index set of M largest absolute entries of h in (S ∪ S 1 ) c , etc. And we denote S 0 = S ∪ S 1 . Then, by adopting Lemma 1, for each i ∈ S k , k ≥ 2, Thus we have Meanwhile, according to the definition of r-RIC, we have Thus by using (16), it follows that On the other hand, Since (v r 1 +v r 2 ) 1/r ≤ 2 1/r−1 (v 1 +v 2 ) for any v 1 , v 2 ≥ 0, combining (17) and (18) gives The proof is completed.
Based on this theorem, we can obtain the following corollary by assuming that the original signal x is s-sparse (σ s (x) r = 0) and the measurement vector is noise free (e = 0 and η = 0), which acts as a natural generalization of Theorem 2.4 in Chartrand and Staneva [30] from the case α = 0 to any α ∈ [0, 1]. < a 1− r 2 when α ∈ (0, 1]. Thus, the sufficient condition established here is slightly stronger than that for the traditional ℓ r -minimization in Chartrand and Staneva [30] if α ∈ (0, 1].

q-Ratio CMSV
Before the discussion of q-ratio CMSV, we start with presenting the definition of q-ratio sparsity as a kind of effective sparsity measure. We list the detailed statement here for the sake of completeness.
We are able to establish the performance bounds for both the ℓ q norm and ℓ r norm of the reconstruction error via a recently developed computable incoherence measure of the measurement matrix, called q-ratio CMSV. Its definition is given as follows.
Then, when the signal is exactly sparse, we have the following qratio CMSV based sufficient condition for valid upper bounds of the reconstruction error, which are much more concise to obtain than the r-RIP based ones.

Theorem 2.
For any 1 < q ≤ ∞, 0 ≤ α ≤ 1, and 0 < r ≤ 1, if the signal x is s-sparse and the measurement matrix A satisfies the condition then any solutionx to the minimization problem (3) obeys x − x r ≤ 2 r+1 1 − α 1/r · s 1/r−1/q η ρ q, 2 1−α q r(q−1) s q−r r(q−1) Proof. Suppose the support of x to be S with |S| ≤ s and h =x−x, then, based on (11), we have Hence, for any 1 < q ≤ ∞, it holds that As a consequence, Therefore, according to the definition of q-ratio CMSV the condition (22), and the fact that Ah 2 ≤ 2η [see (12)], we can obtain that which completes the proof of (23). In addition, (1 − α) h r r ≤ h r r − α h r 1 ≤ 2s 1−r/q h r q yields Therefore, (24) holds and the proof is completed.
Remarks. Note that the results (11) and (12) in Theorem 1 of Zhou and Yu [12] correspond to the special case of α = 0 and r = 1 in this result. As a by-product of this theorem, we have that the perfect recovery can be guaranteed for any s-sparse signal x via (3) with η = 0, if there exists some q ∈ (1, ∞] such that the q-ratio CMSV of the measurement matrix A fulfils ρ q, 2 1−α q r(q−1) s q−r r(q−1) (A) > 0. As studied in Zhou and Yu [12,32], this kind of q-ratio CMSV based sufficient conditions holds with high probability for subgaussian and a class of structured random matrices as long as the number of measurements is reasonably large.
Next, we extend the result to the case that x is compressible (i.e., not exactly sparse but can be well-approximated by an exactly sparse signal). Theorem 3. For any 1 < q ≤ ∞, 0 ≤ α ≤ 1 and 0 < r ≤ 1, if the measurement matrix A satisfies the condition ρ q, 4 1−α q r(q−1) s q−r r(q−1) (A) > 0, (29) then any solutionx to the minimization problem (3) fulfils x − x r ≤ 4 1 − α Proof. We assume that S is the index set that contains the largest s absolute entries of x so that σ s (x) r = x S c r and let h =x − x.
Therefore, we have which completes the proof of (30).
Remarks. When we select α = 0 and r = 1, our results reduce to the corresponding results for the ℓ 1 -minimization or Basis Pursuit in Theorem 2 of Zhou and Yu [12]. In general, the sufficient condition provided here and that in Theorem 2 are slightly stronger than those established for the ℓ 1 -minimization in Zhou and Yu [12], noticing that q q−1 s for any 1 < q ≤ ∞, 0 ≤ α ≤ 1, and 0 < r ≤ 1. This is caused by the fact that the technical inequalities used like (25) and (32) are far from tight. And this is also the case in the r-RIP based analysis. In fact, both r-RIP and qratio CMSV based conditions are loose. The discussion on much tighter sufficient conditions such as the NSP based conditions investigated in Tran and Webster [33], is left for future work.
Frontiers in Applied Mathematics and Statistics | www.frontiersin.org As for the inner loop used to solve (38), we view it as a minimization problem of a difference of two convex functions, that is, the objective function . We start with x n+1,0 = 0. For k = 0, 1, 2, · · · , in the k + 1 step, by linearizing H(x) with the approximation H(x n+1,k ) + y n+1,k , x − x n+1,k where y n+1,k ∈ ∂H(x n+1,k ), i.e. y n+1,k is a subgradient of H(x) at x n+1,k . Then we have where sign(·) is the sign function. The termination criterion for the inner loop is set to be x n+1,k+1 − x n+1,k 2 max{ x n+1,k 2 , 1} < δ for some given parameter tolerance parameter δ > 0. Basically, this algorithm can be regarded as a generalized version of IRLS algorithm. Obviously, when α = 0, it exactly reduces to the traditional IRLS algorithm used for solving the ℓ rminimization problem.

NUMERICAL EXPERIMENTS
In this section, some numerical experiments on the proposed algorithm in section 3 are conducted to illustrate the performance of the weighted ℓ r − ℓ 1 minimization in simulated sparse signal recovery.

Successful Recovery
First, we focus on the weighted ℓ r − ℓ 1 minimization itself. In this set of experiments, the s-sparse signal x is of length N = 256, which is generated by choosing s entries uniformly at random, and then choosing the non-zero values from the standard normal distribution for these s entries. The underdetermined linear measurements y = Ax + e ∈ R m , where A ∈ R m×N is a standard Gaussian random matrix and the entries of the noise vector {e i , i = 1, 2, · · · , m} i.i.d.
∼ N(0, σ 2 ). Here we fix the number of measurements m = 64 and select a sequence of s as 10, 12, · · · , 36. We run the experiments for both noiseless and noisy cases. In all the experiments, we let the tolerance parameter δ = 10 −3 . And all the results are averaged over 100 repetitions.
In the noiseless case, i.e., σ = 0, we set λ = 10 −6 . In Figure 2, we show the results of successful recovery rate for different α (i.e., α = 0, 0.2, 0.5, 0.8, 1) while fixing r but varying the sparsity level s. We view it as a successful recovery if x − x 2 / x 2 < 10 −3 . We do the experiments for r = 0.3 and r = 0.7, respectively. As we can see, when r is fixed, the influence of the weight α is negligible, especially in the case that r is relatively small. But the performance does improve in some scenarios when a proper weight α is used. However, the problem of adaptively selecting the optimal α seems to be challenging and is left for future work. In addition, we present the reconstruction performances for different r (i.e., r = 0.01, 0.2, 0.5, 0.8, 1) while the weight α is fixed to be 0.2 and 0.8 in Figure 3. Note that small r is favored when the weight α is fixed. And a non-convex recovery with 0 < r < 1 performs much better than the convex case (r = 1).
Next, we consider the noisy case, that is σ = 0.01. We set λ = 10 −4 . And we evaluate the recovery performance by the signal to noise ratio (SNR), which is given by SNR = 20 log 10 x 2 x − x 2 .   Figures 4, 5, the findings aforementioned can still be seen here.

Algorithm Comparisons
Second, we compare the weighted ℓ r − ℓ 1 minimization with some well-known algorithms. The following state-of-the-art recovery algorithms are operated: • ADMM-Lasso, see Boyd et al. [36].
We only consider the exactly sparse signal recovery in the noiseless case, and conduct the experiments under the same settings as in section 4.1. We present the successful