Efficient Tuning-Free l 1-Regression of Nonnegative Compressible Signals

In compressed sensing the goal is to recover a signal from as few as possible noisy, linear measurements with the general assumption that the signal has only a few non-zero entries. The recovery can be performed by multiple different decoders, however most of them rely on some tuning. Given an estimate for the noise level a common convex approach to recover the signal is basis pursuit denoising. If the measurement matrix has the robust null space property with respect to the ℓ 2 -norm, basis pursuit denoising obeys stable and robust recovery guarantees. In the case of unknown noise levels, nonnegative least squares recovers non-negative signals if the measurement matrix fulfills an additional property (sometimes called the M + -criterion). However, if the measurement matrix is the biadjacency matrix of a random left regular bipartite graph it obeys with a high probability the null space property with respect to the ℓ 1 -norm with optimal parameters. Therefore, we discuss non-negative least absolute deviation (NNLAD), which is free of tuning parameters. For these measurement matrices, we prove a uniform, stable and robust recovery guarantee. Such guarantees are important, since binary expander matrices are sparse and thus allow for fast sketching and recovery. We will further present a method to solve the NNLAD numerically and show that this is comparable to state of the art methods. Lastly, we explain how the NNLAD can be used for viral detection in the recent COVID-19 crisis.


INTRODUCTION
Since it has been realized that many signals admit a sparse representation in some frames, the question arose whether or not such signals can be recovered from less samples than the dimension of the domain by utilizing the low dimensional structure of the signal. The question was already answered positively in the beginning of the millennium [1,2]. By now there are multiple different decoders to recover a sparse signal from noisy measurements with robust recovery guarantees. Most of them however rely on some form of tuning, depending on either the signal or the noise.
The basis pursuit denoising requires an upper bound on the norm of the noise ( [3], Theorem 4.22), the least shrinkage and selection operator an estimate on the ℓ 1 -norm of the signal ( [4], Theorem 11.1) and the Lagrangian version of least shrinkage and selection operator allegedly needs to be tuned to the order of the the noise level ( [4], Theorem 11.1). The expander iterative hard thresholding needs the sparsity of the signal or an estimate of the order of the expansion property ( [3], Theorem 13.15). The order of the expansion property can be calculated from the measurement matrix, however there is no polynomial time method known to do this. Variants of these methods have similar drawbacks. The non-negative basis pursuit denoising requires the same tuning parameter as the basis pursuit denoising [5]. Other thresholding based decoders like sparse matching pursuit and expander matching pursuit have the same limitations as the expander iterative hard thresholding [6].
If these side information is not known a priori, many decoders yield either no recovery guarantees or, in their imperfectly tuned versions, yield sub-optimal estimation errors ( [3], Theorem 11.12). Even though the problem of sparse recovery from under-sampled measurements has been answered long ago, finding tuning free decoders that achieve robust recovery guarantees is still a topic of interest.
The most prominent achievement for that is the non-negative least squares (NNLS) [7][8][9][10][11]. It is completely tuning free [12] and in [13,14] it was proven that it achieves robust recovery guarantees if the measurement matrix consists of independent biased sub-Gaussian random variables.

Our Contribution
We will replace the least squares in the NNLS with an arbitrary norm and obtain the non-negative least residual (NNLR). We use the methods of [13] to prove recovery guarantees under similar conditions as the NNLS. In particular, we consider the case where we minimize the ℓ 1 -norm of the residual (NNLAD) and give a recovery guarantee if the measurement matrix is a random walk matrix of a uniformly at random drawn D-left regular bipartite graph.
In general, our results state that if the M + criterion is fulfilled, the basis pursuit denoising can be replaced by the tuning-less NNLR for non-negative signals. Note that the M + criterion can be fulfilled by adding only one explicitly chosen measurement if that is possible in the application. Thus, in practice the NNLR does not require more measurements than the BPDN to recover sparse signals. While biased sub-Gaussian measurement matrices rely on a probabilistic argument to verify that such a measurement is present, random walk matrices of left regular graphs naturally have such a measurement. The tuning-less nature gives the NNLR an advantage over other decoders if the noise power can not be estimated, which is for instance the case when we have multiplicative noise or the measurements are Poisson distributed. Note that Laplacian distributed noise or the existence of outliers also favors an ℓ 1 regression approach over an ℓ 2 regression approach and thus motivate to use the NNLAD over the NNLS.
Further, the sparse structure of left regular graphs can reduce the encoding and decoding time to a fraction. Using [15] we can solve the NNLAD with a first order method of a single optimization problem with a sparse measurement matrix. Other state of the art decoders often use non-convex optimization, computationally complex projections or need to solve multiple different optimization problems. For instance, to solve the basis pursuit denoising given a tuning parameter a common approach is to solve a sequence of ℓ 1 -constrained least residual 1 problems to approximate where the Pareto curve attains the value of the tuning parameter of basis pursuit denoising [16]. Cross-validation techniques suffer from similar issues [17].

Relations to Other Works
We build on the theory of [13] that uses the ℓ 2 null space property and the M + criterion. These methods have also been used in [12,14]. To the best of the authors knowledge the M + criterion has not been used with an ℓ 1 null space property before. Other works have used adjacency matrices of graphs as measurements matrices including [6,[18][19][20][21]. The works [18,19] did not consider noisy observations. The decoder in [20] is the basis pursuit denoising and thus requires tuning depending on the noise power. [21] proposes two decoders for non-negative signals. The first is the non-negative basis pursuit which could be extended to the nonnegative basis pursuit denoising. However, this again needs a tuning parameter depending on the noise power. The second decoder, the Reverse Expansion Recovery algorithm, requires the order of the expansion property, which is not known to be calculatable in a polynomial time. The survey [6] contains multiple decoders including the basis pursuit, which again needs tuning depending on the noise power for robustness, the expander matching pursuit and the sparse matching pursuit, which need the order of the expansion property. Further, [5] considered sparse regression of non-negative signals and also used the non-negative basis pursuit denoising as decoder, which again needs tuning dependent on the noise power. To the best of the authors knowledge, this is the first work that considers tuning-less sparse recovery for random walk matrices of left regular bipartite graphs. The NNLAD has been considered in [22] with a structured sparsity model without the use of the M + criterion.

PRELIMINARIES
For K ∈ N we denote the set of integers from 1 to K by [K]. For a set T ⊂ [N] we denote the number of elements in T by #(T). Vectors are denoted by lower case bold face symbols, while its corresponding components are denoted by lower case italic letters. Matrices are denoted by upper case bold face symbols, while its corresponding components are denoted by upper case italic letters. For x ∈ R N we denote its ℓ p -norms by ||x|| p . Given A ∈ R M×N we denote its operator norm as operator from ℓ q to ℓ p by ||A|| q → p : sup v ∈ R N ,||v|| q ≤ 1 ||Av|| p . By R N + we denote the nonnegative orthant. Given a closed convex set C ⊂ R N , we denote the projection onto C, i.e., the unique minimizer of argmin z ∈ C 1/2||z − v|| 2 2 , by P C (v). For a vector x ∈ R N and a set T ⊂ [N], x| T denotes the vector in R N , whose nth component is x n if n ∈ T and 0 else. Given N, S ∈ N we will often need sets T ⊂ [N] with #(T) ≤ S and we abbreviate this by #(T) ≤ S if no confusion is possible.
Given a measurement matrix A ∈ R M×N a decoder is any map Q A : R M → R N . We refer to x ∈ R N as signal. If x ∈ R N + {z ∈ R N : z n ≥ 0 for all n ∈ [N]}, we say the signal is non-negative and write shortly x ≥ 0. If additionally x n > 0 for all n ∈ [N], we write x > 0. An input of a decoder, i.e., any y ∈ R M , is refered to as observation. We allow all possible inputs of the decoder as observation, since in general the transmitted codeword Ax is disturbed by some noise. Thus, given a signal x and an observation y we call e : y − Ax the noise. A signal x is called S-sparse if ||x|| p : #({n ∈ [N] : x n ≠ 0}) ≤ S. We denote the set of S-sparse vectors by Given some S ∈ [N] the compressibility of a signal x can be measured by d 1 (x, Σ S ) : inf z ∈ ΣS ||x − z|| 1 .Given N and S, the general non-negative compressed sensing task is to find a measurement matrix A ∈ R M×N and a decoder Q A : R M → R N with M as small as possible such that the following holds true: There exists a q ∈ [1, ∞] and a continuous function C : holds true. This will ensure that if we can control the compressibility and the noise, we can also control the estimation error and in particular decode every noiseless observation of S-sparse signals exactly.

MAIN RESULTS
Given a measurement matrix A ∈ R M×N and a norm · on R M we define the decoder as follows: Given y ∈ R M set Q A (y) as any minimizer of argmin z≥0 Az − y .
We call this problem non-negative least residual (NNLR). In particular, for · · 1 this problem is called non-negative least absolute deviation (NNLAD) and for · · 2 this problem is known as the non-negative least squares (NNLS) studied in [13]. In fact, we can translate the proof techniques fairly simple. We just need to introduce the dual norm. Definition 3.1. Let · be a norm on R M . The norm · * on R M defined by v * : sup u≤1 〈v, u〉, is called dual norm to · .
Note that the dual norm is actually a norm. To obtain a recovery guarantee for NNLR we have certain requirements on the measurement matrix A. We use a null space property. Definition 3.2. Let S ∈ [N], q ∈ [1, ∞) and · be any norm on R M . Further let A ∈ R M×N . Suppose there exists constants ρ ∈ [0, 1) and τ ∈ [0, ∞) such that v T q ≤ ρS 1 / q−1 v T c 1 +τ||Av|| for all v ∈ R N and #(T) ≤ S.
Then, we say A has the ℓ q -robust null space property of order S with respect to · or in short A has the ℓ q -RNSP of order S with respect to · with constants ρ and τ. ρ is called stableness constant and τ is called robustness constant.
Note that smaller stableness constants increase the reliability of recovery if many, small, non-zero components are present, while smaller robustness constants increase the reliability if the measurements are noisy. In order to make use of the nonnegativity of the signal, we need A to be biased in a certain way. In [13] this bias was guaranteed with the M + criterion. Definition 3.3. Let A ∈ R M×N . Suppose there exists t ∈ R M such that A T t > 0. Then we say A obeys the the M + criterion with vector t and constant κ : Note that κ is actually a condition number of the matrix with diagonal A T t and 0 else. Condition number numbers are frequently used in error bounds of numerical linear algebra. The general recovery guarantee is the following and similar results have been obtained in the matrix case in [23].
Theorem 3.4 (NNLR Recovery Guarantee). Let S ∈ [N], q ∈ [1, ∞) and · be any norm on R M with dual norm · * . Further, suppose that A ∈ R M×N obeys a) the ℓ q -RNSP of order S with respect to · with constants ρ and τ and b) the M + criterion with vector t and constant κ.
If κρ < 1, the following recovery guarantee holds true: For all x ∈ R N + and y ∈ R M any minimizer x # of argmin z≥0 Az − y , obeys the bound If q 1, this bound can be improved to Proof The proof can be found in Subsection 6.1. Given a matrix with ℓ q -RNSP we can add a row of ones (or a row consisting of one minus the column sums of the matrix) to fulfill the M + criterion with the optimal κ 1. Certain random measurement matrices guarantee uniform bounds on κ for fixed vectors t. In ( [13], Theorem 12) it was proven that if A m,n are all i.i.d. 0/1 Bernoulli random variables, A has M + criterion with t (1, . . . , 1) T ∈ R M and κ ≤ 3 with high probability. This is problematic, since if κ > 1, it might happen that κρ < 1 is not fulfilled anymore. Since the stableness constant ρ(S ′ ) as a function of S ′ is monotonically increasing, the condition κρ(S ′ ) < 1 might only hold if S ′ < S. If that is the case, there are vectors x ∈ Σ S that are being recovered by basis pursuit denoising but not by NNLS! This is for instance the case for the matrix A criterion with κ ≥ 2 for any possible choice of t. In particular, the vector x (0, 0, 1) T is not necessarily being recovered by the NNLAD and the NNLS. Hence, it is crucial that the vector t is chosen to minimize κ and ideally obeys the optimal κ 1. This motivates us to use random walk matrices of regular graphs since they obey exactly this.
is called the set of right vertices connected to the set of left vertices holds true, then D −1 A is called a random walk matrix of a (S, D, θ)-lossless expander. Note that we have made a slight abuse of notation. The term D-LRBG as a short form for D-left regular bipartite graph refers in our case to the random walk matrix A but not the graph itself. We omit this minor technical differentiation, for the sake of shortening the frequently used term random walk matrix of a D-left regular bipartite graph. Lossless expanders are bipartite graphs that have a low number of edges but are still highly connected, see for instance ([24], Chapter 4). As a consequence their random walk matrices have good properties for compressed sensing. It is well known that random walk matrices of a (2S, D, θ)-lossless expanders obey the ℓ 1 -RNSP of order S with respect to · , see ( [3], Theorem 13.11). The dual norm of · 1 is the norm · ∞ and the M + criterion is easily fulfilled, since the columns sum up to one. From Theorem 3.4 we can thus draw the following corollary. Corollary 3.6 (Lossless Expander Recovery Guarantee). Let S ∈ [N], θ ∈ [0, 1/6). If A ∈ {0, D −1 } M×N is a random walk matrix of a (2S, D, θ)-lossless expander, then the following recovery guarantee holds true: For all x ∈ R N + and y ∈ R M any minimizer x # of argmin z≥0 Az − y 1 obeys the bound Proof By ( [3], Theorem 13.11) A has ℓ 1 -RNSP with respect to · 1 with constants ρ 2θ/(1 − 4θ) and τ 1/(1 − 4θ). The dual norm of the norm · 1 is · ∞ . If we set t : (1, . . . , 1) T ∈ R M , we get Hence, A has the M + criterion with vector t and constant κ 1 and the condition κρ < 1 is immediately fulfilled. We obtain ||t|| * ||t|| ∞ 1 and max n ∈ [N] (A T t) −1 n 1. Applying Theorem 3.4 with improved bound for q 1 and these values yields If we additionally substitute the values for ρ and τ we get This finishes the proof. Note that ( [3], Theorem 13.11) is an adaption of ( [20], Lemma 11) to account for robustness and skips proving the ℓ 1 restricted isometry property. If M ≥ 2/θexp(2/θ)SLn(eN/S) and

On the Robustness Bound for Lossless Expanders
If A is a random walk matrix of a (2S, D, θ)-lossless expander with θ ∈ [0, 1/6), then we can also draw a recovery guarantee for the NNLS. By ( [3], Theorem 13.11) A has ℓ 1 -RNSP with respect to · 1 with constants ρ 2θ/(1 − 4θ) and τ 1/(1 − 4θ) and hence also ℓ 1 -RNSP with respect to · 2 with constants ρ ′ ρ and τ ′ τM 1/2 . Similar to the proof of Corollary 3.6 we can use Theorem 3.4 to deduce that any minimizer x # of argmin z≥0 Az − y 2 , obeys the bound If the measurement error e y − Ax is a constant vector, i.e., e α1, then e 1 M 1/2 e 2 . In this case the error bound of the NNLS is just as good as the error bound of the NNLAD. However, if e is a standard unit vector, then e 1 e 2 . In this case the error bound of the NNLS is significantly worse than the error bound of the NNLAD. Thus, the NNLAD performs better under peaky noise, while the NNLS and NNLAD are tied under noise with evenly distributed mass. We will verify this numerically in Subsection 5.1. One can draw a complementary result for matrices with biased sub-Gaussian entries, which obey the ℓ 2 -RNSP with respect to · 2 and the M + criterion in the optimal regime [13]. Table 1 states the methods, which have an advantage over the other in each scenario.

NON-NEGATIVE LEAST ABSOLUTE DEVIATION USING A PROXIMAL POINT METHOD
In this section we assume that · · p with some p ∈ [1, ∞]. If p ∈ {1, ∞}, the NNLR can be recast as a linear program by introducing some slack variables. For an arbitrary p the NNLR is a convex optimization problem and the objective function has a simple and globally bounded subdifferential. Thus, the NNLR can directly be solved with a projective subgradient method using a problem independent step size. Such subgradient methods achieve only a convergence rate of O(log(k)k −1/2 ) toward the optimal objective value ( [25], Section 3.2.3), where k is the number of iterations performed. In the case that the norm is the ℓ 2 -norm, we can transfer the problem into a differentiable version, i.e. the NNLS argmin z≥0 Since the gradient of such an objective is Lipschitz, this problem can be solved by a projected gradient method with constant step size, which achieves a convergence rate of O(k −2 ) toward the optimal objective value [26,27]. However this does not generalize to the ℓ 1 -norm. The proximal point method proposed in [15] can solve the case of the ℓ 1 -norm with a convergence rate O(k −1 ) toward the optimal objective value. Please refer to Algorithm 1.
Algorithm 1 is a primal-dual algorithm. Within the loop, lines 7, 8, 1 and 2 calculate the proximal point operator of the Fenchel conjugate of A · −y 1 to update the dual problem, lines 3 and 5 update the primal problem, and lines 4 and 6 perform a momentum step to accelerate convergence. Further, line 8 setsx to Ax and avoids a third matrix vector multiplication. Note that σ 1 and σ 2 can be replaced by any values that satisfy σ 1 σ 2 < ||A|| −2 2 → 2 . The calculation of σ 1 and σ 2 might be a bottle neck for the computational complexity of the algorithm. If one wants to solve multiple problems with the same matrix, σ 1 and σ 2 should only be calculated once and not in each run of the algorithm. For any σ 1 σ 2 < A −2 2 → 2 the following convergence guarantee can be deduced from ( [15], Theorem 1). Let x k and w k be the values of x and w at the end of the kth iteration of the while loop of Algorithm 1. Then, the following statements hold true: (1) The iterates converge: The sequence (x k ) k ∈ N converges to a minimizer of argmin z≥0 Az − y 1 . (2) The iterates are feasible: We have x k ≥ 0 and w k ∞ ≤ 1 for all k ≥ 1.
(3) There is a stopping criteria for the iterates: In particular, if Ax k − y 1 + 〈y, w k 〉 ≤ 0 and A T w k ≥ 0, then x k is a minimizer of argmin z ≥ 0 Az − y 1 . (5) The averages obey the convergence rate toward the optimal objective value: The formal version and proof can be found in [28]. Note that this yields a convergence guarantee for both the iterates and averages, but the convergence rate is only guaranteed for the averages. Algorithm 1 is optimized in the sense that it uses the least possible number of matrix vector multiplications per iteration, since these govern the computational complexity.

Iterates or Averages
The question arises whether or not it is better to estimate with averages or iterates. Numerical testing suggest that the iterates reach tolerance thresholds significantly faster than the averages. We can only give a heuristically explanation for this phenomenon. The stopping criteria of the iterates yields lim k → ∞ A T w k ≥ 0. In practice we observe that A T w k ≥ 0 for all sufficiently large k. However, A T w k+1 ≥ 0 yields x k+1 ≤ x k . This monotonicity promotes the converges of the iterates and gives a clue why the iterates seem to converge better in practice. See Figures 5, 6.

On the Convergence Rate
As stated the NNLS achieves the convergence rate O(k −2 ) [27] while the NNLAD only achieves the convergence rate of O(k −1 ) toward to optimal objective value. However, this should not be considered as weaker, since the objective function of the NNLS is the square of a norm. If x k are the iterates of the NNLS implementation of [27], algebraic manipulation yields Thus, the ℓ 2 -norm of the residual of the NNLS iterates only decays in the same order as the ℓ 1 -norm of the residual of the NNLAD averages.

NUMERICAL EXPERIMENTS AND APPLICATIONS
In the first part of this section we will compare NNLAD with several state of the art recovery methods in terms of achieved sparsity levels and decoding time.

Properties of the Non-Negative Least Absolute Deviation Optimizer
We recall that the goal is to recover x from the noisy linear measurements y Ax + e. To investigate properties of the minimizers of NNLAD we compare it to the minimizers of the well studied problems basis pursuit (BP), optimally tuned basis pursuit denoising (BPDN), optimally tuned ℓ 1 -constrained least residual (CLR) and the NNLS, which are given by   The NNLAD is designed to recover non-negative signals in general and, as we will see, it is able to recover sparse nonnegative signals comparable well to the CLR and BPDN with optimal tuning which are particularly designed for this task. Further, we compare the NNLAD to the expander iterative hard thresholding (EIHT). The EIHT is calculated by stopping the following sequence after a suitable stopping criteria is met: x 0 : 0 and x k+1 : P Σ S′ x k + median(y − Ax k for all k ∈ N 0 and with S′ x 0 (EIHT), where median(z) n is the median of (z m ) m ∈ Row({n}) and P Σ S (v) is a hard thresholding operator, i.e., some minimizer of There is a whole class of thresholding based decoders for lossless expanders, which all need either the sparsity of the signal or the order of the expansion property as tuning parameter. We choose the EIHT as a represent of this class, since the cluster points of its sequence have robust recovery guarantees ([3], Theorem 13.5). By convex decoders we refer to BPDN, BP, CLR, NNLAD, and NNLS. We choose the optimal tuning ε e 1 for the BPDN and τ ||x|| 1 for the CLR. The optimally tuned BPDN and CLR are representing a best case benchmark. In ( [29], Figure 1.1) it was noticed that tuning the BPDN with ε > ||e|| p often leads to worse estimation errors than tuning with ε < ||e|| p for p 2. Thus, BP is a version of BPDN with no prior knowledge about the noise and represents a worst case benchmark. At fist we investigate the properties of the estimators. In order to mitigate effects from different implementations we solve all optimization problems with the CVX package of Matlab [30,31]. For a given ℓ 1 SNR, r, N, M, D, S we will do the following experiment multiple times: Experiment 1 , then e 2 e 1 . Thus, these two noise distributions each represent randomly drawn noise vectors obeying one norm equivalence asymptotically tightly up to a constant. From Eqs 1, 2 we expect that the NNLS has roughly the same estimation errors as the NNLAD for r 1, i.e. the evenly distributed noise, and significantly worse estimation errors for r 0, i.e., the peaky noise.

Quality of the Estimation Error for Varying Sparsity
We fix the constants r 1, N 1024, M 256, D 10, ℓ 1 SNR 1000 and vary the sparsity level S ∈ [128]. For each S we repeat Experiment 1 100 times. We plot the mean of the relative ℓ 1 -estimation error and the mean of the logarithmic relative ℓ 1 -estimation error, i.e., Mean(LNℓ 1 E) Mean 10 log 10 over the sparsity. The result can be found in Figures 1A,B. For S ≥ 30 the estimation error of the EIHT randomly peaks high. We deduce that the EIHT fails to recover the signal reliably for S ≥ 30, while the NNLAD and other convex decoders succeed. This is not surprising, since by ( [3], Theorem 13.15) the EIHT obeys a robust recovery guartanee for S-sparse signals, whenever A is the random wak matrix of a (3S, D, θ ′ )-lossless expander with θ ′ < 1/12. This is significantly stronger than the (2S, D, θ)-lossless expander property with θ < 1/6 required for a null space property. It might also be that the null space property is more likely than the lossless expansion property similar to the gap between ℓ 2 -restricted isometry property and null space property [32]. However, if the EIHT recovers a signal, it recovers it significantly better than any convex method. This might be the case, since the originally generated signal is indeed from Σ S , which is being enforced by the hard thresholding of the EIHT, but not by the convex decoders. This suggests that it might be useful to consider using thresholding on the output of any convex decoder to increase the accuracy if the orignal signal is indeed sparse and not only compressible. For the remainder of this subsection we focus on convex decoders.
Contrary to our expectation the BPDN achieves worse estimation errors than all other convex decoders for S ≥ 60, even worse than the BP. The authors have no explanation for this phenomenon. Apart from that we observe that the CLR and BP indeed perform as respectively best and worst case benchmark. However, the difference between BP and CLR becomes rather small for high S. We deduce that tuning becomes less important near the optimal sampling rate.
The NNLAD, NNLS and CLR achieve roughly the same estimation errors. However, note that the BPDN and CLR are  optimally tuned using unknown prior information unlike the NNLAD and NNLS. As expected the NNLS performs roughly the same as the NNLAD, see Table 1. However, this is the result of the noise distribution for r 1. We repeat Experiment 1 with the same constants, but r 0, i.e., e is a unit vector scaled by ± ||Ax|| 1 /ℓ 1 SNR. We plot the mean of the relative ℓ 1 -estimation error and the mean of the logarithmic relative ℓ 1 -estimation error over the sparsity. The result can be found in Figures 2A,B.
We want to note that similarly to Figure 1A the EIHT works only unreliably for S ≥ 30. Even though the mean of the logarithmic relative ℓ 1 -estimation error of NNLS is worse than the one of EIHT for 30 ≤ S ≤ 60, the NNLS does not fail but only approximates with a weak error bound. As the theory suggests, the NNLS performs significantly worse than the NNLAD, see Table 1. It is worth to mention, that the estimaton errors of NNLS seem to be bounded by the estimation errors of BP. This suggests that A obeys a ℓ 1 quotient property, that bounds the estimation error of any instance optimal decoder, see ( [3], Lemma 11.15).

Noise-Blindness
Theorem 3.4 states that the NNLAD has an error bound similarly to the optimally tuned CLR and BPDN. Further, by Eq. 1 the ratio should be bounded by some constant. To verify this, we fix the constants r 1, N 1024, M 256, D 10, S 32 and vary the signal to noise ratio ℓ 1 SNR ∈ 10[100]. For each ℓ 1 SNR we repeat Experiment 1 100 times. We plot the mean of the logarithmic relative ℓ 1 -estimation error and the mean of the ratio of relative ℓ 1 -estimation error and ℓ 1 -noise power, i.e.
Mean(LNℓ 1 E) Mean 10 log 10 over the sparsity. The result can be found in Figures 3A,B . The logarithmic relative ℓ 1 -estimation errors of the different decoders stay in a constant relation to each other over the whole range of ℓ 1 SNR. This relation is roughly the relation we can find in Figure 1B for S 32. As expected the the ratio of relative ℓ 1 -estimation error and ℓ 1 -noise power stays constant independent on the  ℓ 1 SNR for all decoders. We deduce that the NNLAD is noiseblind. We repeat the experiment with r 0 and obtain Figures 4A,B.
Note that x − x # 1 / x 1 and not x − x # 1 /( x 1 e 1 ) seems to be constant. Since x − x # 1 / x 1 ≈ 1.0 · 10 − 7 is fairly small, we suspect that this is the result of CVX reaching a tolerance parameter 2 eps √ ≈ 1.5 · 10 − 8 and terminating, while the actual optimizer might in fact be the original signal. It is remarkable that even with the incredibly small signal to noise ratio of 10 the signal can be recovered by the NNLAD with an estimation error of 1.0 · 10 − 7 for this noise distribution.

Non-Negative Least Absolute Deviation Vs Iterative Methods
To investigate the convergence rates of the NNLAD as proposed in 4, we compare it to different types of decoders when e 0. There are some sublinear time recovery methods for lossless expander matrices including ( [3 Section 13.4,5]). These are, as the name suggests, significantly faster than the NNLAD. These, as several other greedy methods ([3 Section 13.3, 5, 18, 19, 21]), rely on a strong lossless expansion property. As a representative of all greedy and sublinear time methods we will consider the EIHT, which has a linear convergence rate O(c −k ) toward the signal and robust recovery guarantees ( [3], Theorem 13.15). The EIHT also represents a best case benchmark. As a direct competitor we consider the NNLS implemented by the methods of [27] 3 , which has a convergence rate of O(k −2 ) toward the optimal objective value. [27] can also be used to calculate the CLR if the residual norm is the ℓ 2 -norm. However, calculating the projection onto the ℓ 1 -ball in R N , is computationally slightly more complex than the projection onto R N + . Thus, the CLR will be solved slightly slower than the NNLS with [27]. Note that cross-validation techniques would need to solve multiple optimization problems of a similar complexity as the NNLS to estimate a signal. As a consequence such methods have a multiple times higher complexity than the NNLS and are not considered here. As a worst case benchmark we consider a simple projected subgradient implementation of NNLAD using the Polyak step size, i.e.

NNLAD Subgrad
which has a convergence rate of O(k −1/2 ) toward the optimal objective value ( [33], Section 7.2.2 & Section 5.3.2). We initialized all iterated methods by 0. The EIHT will always use the parameter S ′ ||x|| 0 , the NNLAD σ 1 σ 2 0.99 A −1 2 → 2 and the NNLS the parameters s 0.99 A −2 2 → 2 and α 3.01, see [27]. Just like the BPDN and CLR, the EIHT needs an oracle to get some unknown prior information, in this case x 0 . Parameters that can be computed from A, will be calculated before the timers start. This includes the adjacency structure of A for the EIHT, σ 1 , σ 2 for NNLAD, s, α for NNLS, since these are considered to be a part of the decoder. We will do the following experiment multiple times: x k for all k ≤ 20000 and collect the relative estimation errors x k − x 1 / x 1 , the relative norms of the residuals Ax k − y 1 / y 1 and the time to calculate the first k iterations.
For r 2 this represents a biased sub-Gaussian random ensemble [13] with optimal recovery guarantees for the NNLS.For r 1 this represents a D-LRBG random ensemble with optimal recovery guarantees for the NNLAD. We fix the constants r 1, N 1024, M 256, S 16, D 10 and repeat 2 100 times. We plot the mean of the logarithmic relative ℓ 1 -estimation error and the mean of the relative ℓ 1 -norm of the residual, i.e.
Mean(LNℓ 1 E) Mean 10 log 10 x k − x 1 ||x|| 1 , Mean(LNℓ 1 R) Mean 10 log 10 Ax k − y 1 The averages of NNLAD converge significantly slower than the iterates, even though we lack a convergence rate for the iterates. We deduce that one should always use the iterates of NNLAD to recover a signal. Surprisingly, the averages converge even slower than the subgradient method. However, this is not because the averages converge slow, but rather because the subgradient method and all others converges faster than expected. In particular, the NNLAD iterates, EIHT and the NNLS all converge linearly toward the signal. Further, their correspoding objective values also converge linearly toward the optimal objective value. Even the subgradient method converges almost linearly. We deduce that the NNLS is the fastest of these methods if A is a D-LRBG.
Apart from a constant the NNLAD iterates, EIHT and NNLS converge in the same order. However, this behavior does not hold if we consider a different distribution for A as one can verify by setting each component A m,n as independent 0/1 Bernoulli random variables. While EIHT has better iterations compared to the NNLS, it still takes more time to achieve the same estimation errors and residuals. We plot the mean of the time required to calculate the first k iterations in Figure 7.
The EIHT requires roughly 6 times as long as any other method to calculate each iteration. All methods but the EIHT can be implemented with only two matrix vector multiplications, namely once by A and once by A T . Both of these requires roughly 2DN floating point operations. Hence, each iteration requires O(4DN) floating point operations. The EIHT only calculates one matrix vector multiplication, but also the median. This calculation is significantly slower than a matrix vector multiplication. For every n ∈ [N] we need to order a vector with D elements, which can be performed in O(DlogD). Hence, each iteration of EIHT requires O(DNlogD) floating point operations, which explains why the EIHT requires significantly more time for each iteration.
As we have seen the NNLS is able to recover signals faster than any other method, however it also only obeys sub-optimal robustness guarantees for uniformly at random chosen D-LRBG as we have seen in Figure 4A. We ask ourself whether or not the NNLS is also faster with a more natural measurement scheme, i.e., if A m,n are independent 0/1 Bernoulli random variables. We repeat 2 100 times with r 2 for the NNLS and r 1 for the other methods. We again plot the mean of the logarithmic relative ℓ 1 -estimation error and the mean of the relative ℓ 1 -norm of the residual in Figures 8, 9.  The NNLAD and the EIHT converge to the solution with roughly the same time. Even the subgradient implementation of the NNLAD recovers a signal in less time than the NNLS. Further the convergence of NNLS does not seem to be linear anymore. We deduce that sparse structure of A has a more significant influence on the decoding time than the smoothness of the data fidelity term. Also we deduce that even the subgradient method is a viable choice to recover a signal.

Non-Negative Least Absolute Deviation Vs SPGL1
As a last test we compare the NNLAD to the SPGL1 [16,34] toolbox for matlab. Experiment 3 1. Draw the measurement matrix A ∈ {0, D −1 } M×N uniformly at random among all D-LRBG. 2. Generate the signal x uniformly at random from Σ S ∩R N + ∩S N−1 r . 3. Define the observation y : Ax. 4. Use a benchmark decoder to calculate an estimator x # and collect the relative estimation errors / x 2 and the time to calculate x # . 5. For each iterative method calculate iterations until Collect the time to perform these iterations. If this threshold can not be reached after 10 5 iterations, the recovery failed and the time is set to ∞.
We again fix the dimension N 1024, M 256, D 10 and vary S ∈ [128]. For both the BP implementation of SPGL1 and the CLR implementation of SPGL1 we repeat Experiment 3 100 times for each S. We plot the mean of the time to calculate the estimators and plot these over the sparsity in Figures 10A,B.
The NNLAD implementation is slower than both SPGL1 methods for small S. However, if we have the optimal number of measurements M ∈ O(SlogN/S), the NNLAD is faster than both SPGL1 methods.

Summary
The implementation of NNLAD as presented in Algorithm 1 is a reliable recovery method for sparse non-negative signals. There are methods that might be faster, but these either recover a smaller number of coefficients (EIHT, greedy methods) or they obey sub-optimal recovery guarantees (NNLS). The implementation is as fast as the commonly uses SPGL1 toolbox, but has the advantage that it requires no tuning depending on the unknown x or e. Lastly, the NNLAD can handle peaky noise overwhelmingly good.

Application for Viral Detection
With the outbreak and rapid spread of the COVID-19 virus we need to test a large amount of people for an infection. Since we can only test a fixed number of persons in a given time, the number of persons tested for the virus grows at most linearly. On the other hand, models suggest that the number of possibly infected persons grows exponentially. At some point, if that is not already the case, we will have a shortage of test kits and we will not be able to test every person. It is thus desirable to test as much persons with as few as possible test kits.
The field group testing develops strategies to test groups of individuals instead of individuals in order to reduce the amount of tests required to identify infected individuals. The first advances in group testing were made in [35]. For a general overview about group testing we refer to [36].
The problem of testing a large group for a virus can be modeled as a compressed sensing problem in the following way: Suppose we want to test N persons, labeled by where e e con + e pcr . Since contamination of samples happens rarely, e con is assumed to be peaky in terms of Table 1, while e pcr is assumed to have even mass but a small norm. In total e is peaky. Often each specimen is tested separately, meaning that A is the identity. In particular, we need at least as much test kits as specimens. Further, we estimate the true quantity of viruses x n by x # n : y n , which results in the estimation error x # n − x n e n e con n + e pcr n . Since the noise vector e is peaky, some but few tests will be inaccurate and might result in false positives or false negatives.
In general, only a fraction of persons is indeed affected by the virus. Thus, we assume that x 0 ≤ S for some small S. Since the amount of viruses is a non-negative value, we also have x ≥ 0. Hence, we can use the NNLR to estimate x and in particular we should use the NNLAD due to the noise being peaky. Corollary 3.6 suggests to choose A as the random walk matrix of a lossless expander or by ( [3], Theorem 13.7) to choose A as a uniformly at random chosen D-LRBG. Such a matrix A has non-negative entries and the column sums of A are not greater than one. This is a necessary requirement since each column sum is the total amount of specimen used in the test procedure. Especially, a fraction of D −1 of each specimen is used in exactly D test kits. By Corollary 3.6 and [ [3], Theorem 13.7] this allows us to reduce the number of test kits required to M ≈ CS logeN/S. As we have seen in Figures 4A,B we expect the NNLAD estimator to correct the errors from e con and the estimation error to be in the order of ||e pcr || 1 which is assumed to be small. Hence, the NNLAD estimator with a random walk matrix of a lossless expander might even result in less false positives and false negatives than individual testing.
Note that the lack of knowledge about the noise e favors the NNLAD recovery method over a (BPDN) approach. Further, since the total sum of viruses in all patients given by n∈ [N] x n x 1 is unknown, it is undesirable to use (CLR).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.