Derandomizing compressed sensing with combinatorial design

Compressed sensing is the art of reconstructing structured $n$-dimensional vectors from substantially fewer measurements than naively anticipated. A plethora of analytic reconstruction guarantees support this credo. The strongest among them are based on deep results from large-dimensional probability theory that require a considerable amount of randomness in the measurement design. Here, we demonstrate that derandomization techniques allow for considerably reducing the amount of randomness that is required for such proof strategies. More, precisely we establish uniform s-sparse reconstruction guarantees for $C s \log (n)$ measurements that are chosen independently from strength-four orthogonal arrays and maximal sets of mutually unbiased bases, respectively. These are highly structured families of $\tilde{C} n^2$ vectors that imitate signed Bernoulli and standard Gaussian vectors in a (partially) derandomized fashion.


A. Motivation
Compressed sensing is the art of reconstructing structured signals from substantially fewer measurements than would naively be required for standard techniques like least squares. Although not entirely novel, rigorous treatments of this observation [1], [2] spurred considerable scientific attention from 2006 on, see e.g. [3], [4] and references therein. While deterministic results do exist, the strongest theoretic convergence guarantees still rely on randomness. Broadly, these can be grouped into two families: 1) generic measurements such as independent Gaussian, or Bernoulli vectors. Such an abundance of randomness allows for establishing very strong results by following comparatively simple and instructive proof techniques. The downside is that concrete implementations do require a lot of randomness. In fact, they might be too random to be useful for certain applications. 2) structured measurements such as random rows of a Fourier, or Hadamard matrix. In contrast to generic measurements, these feature a lot of structure that is geared towards applications. Moreover, sampling random rows from a fixed matrix does require very little randomness. E.g. log(n) random bits are required to sample a random DFT row while an i.i.d. Bernoulli vector consumes n bits of randomness. Structure and comparatively little randomness have a downside, however. Theoretic convergence guarantees tend to be weaker than their generic counterparts. It should also not come as a surprise that the necessary proof techniques become considerably more involved.
Typically, results of type 1) precede results of type 2). Phase retrieval via PhaseLift is a concrete example for such a development. Generic convergence guarantees [5], [6] preceded (partially) de-randomized results [7], [8]. Compressed sensing is special in this regard. The two seminal works [1], [2] from 2006 provided both results almost simultaneously. This had an interesting consequence. Despite considerable effort, to this date there still seems to be a gap between both proof techniques.
Here, we try to close this gap by applying a method that is very well established in theoretical computer science: partial derandomization. We start with a proof technique of type 1) and considerably limit the amount of randomness required for it to work. While doing so, we keep careful track of the "amount of randomness" that is still necessary. Finally, we replace the original (generic) random measurements with pseudo-random ones that mimic them in a sufficiently accurate fashion. Our results highlight that this technique almost allows for bridging the gap between existing proof techniques for generic and structured measurements: the results are still strong, but require slightly more randomness than choosing vectors uniformly from a bounded orthogonal system, such as Fourier or Hadamard vectors.
There is a also a didactic angle to this work: within the realm of signal processing, partial-derandomization techniques have been successfully applied to matrix reconstruction [8], [9] and phase retrieval via PhaseLift [7], [10], [11]. Although similar in spirit, the more involved nature of these problems may obscure the key ideas, intuition and tricks behind such an approach. However, the same techniques have not yet been applied to the original problem of compressed sensing. Here, we fill this gap and, in doing so, provide an introduction to partial derandomization techniques by example. To preserve this didactic angle, we try to keep the presentation as simple and self-contained as possible.
Finally, one may argue that compressed sensing has not fully lived up to the high expectations of the community yet, see e.g. [12]. Arguably, one of the most glaring problems for applications is the requirement of choosing individual arXiv:1812.08130v1 [cs.IT] 19 Dec 2018 measurements at random 1 . While we are not able to fully overcome this drawback here, the methods described in this work do limit the amount of randomness required to generate individual structured measurements. We believe that this may help to reduce the discrepancy between "what can be proved" and "what can be done" in a variety of concrete applications.

B. Preliminaries on compressed sensing
Compressed sensing aims at reconstructing s-sparse vectors x ∈ C n from m n linear measurements: Since m n, the matrix A is singular and there are infinitely many solutions to this equation. A convex penalizing function is used to promote sparsity among these solutions. Typically, this penalizing function is the 1 -norm subject to Az = y Mathematical proofs for convergence to the correct solution x ∈ C n have been established for different measurement matrices A. By and large, they require randomness in the sense that each row a i ∈ C n of A is an independent copy of a random vector a ∈ R n . Prominent examples include 1) m = Cs log(n/s) standard complex Gaussian measurements: 3) m = Cs log 4 (n) random rows of a DFT matrix: a f ∼ {f 1 , . . . , f n }, 4) for n = 2 d : m = Cs log 4 (n) random rows of a Hadamard matrix: a h ∼ {h 1 , . . . , h n }. A rigorous treatment of all these cases can be found in Ref. [3]. Here, and throughout this work, C > 0 denotes an absolute constant whose exact value depends on the context, but it is always independent of the problem parameters n, s and m. It is instructive to compare the amount of randomness that is required to generate one instance of the random vectors in question. A random signed Bernoulli vector a sb ∈ R n requires n random bits (one for each coordinate), while a total of d = log 2 (n) random bits suffice to select a random row a h ∈ R n of a Hadamard matrix. A comparison between complex standard Gaussian vectors a g ∈ C n and random Fourier vectors a f ∈ C n indicates a similar discrepancy. In summary: highly structured random vectors, like a f , a h require exponentially fewer random bits to generate than generic random vectors, like a g , a sb . Importantly, this transition from generic measurements to highly structured ones comes at a price. The number of measurements required in case (1) and (4) scales poly-logarithmically in n. More sophisticated approaches allow for converting this offset into a polylogarithmic scaling in s rather than n [14], [15]. Another, arguably even higher price, is hidden in the proof techniques behind these results. They are considerably more involved.
The following two subsections are devoted to introduce formalisms that allow for partially de-randomizing signed Bernoulli vectors and complex standard Gaussian vectors, respectively.

C. Partially de-randomizing signed Bernoulli vectors
Throughout this work, we endow C n with the standard inner product x, y = n i=1x i y i . We denote the associated (Euclidean) norm by z 2 2 = z, z . Let a sb = n i=1 i e i be a signed Bernoulli vector with coefficients i ∼ {±1} chosen independently at random (Rademacher random variables). Then, which is equivalent to demanding Independent sign entries are sufficient, but not necessary for this feature. Indeed, suppose that n = 2 d is a power of two. Then the rows of a Sylvester Hadamard matrix h 1 , . . . , h n correspond to a particular subset of sign vectors. Let a h ∈ R n be the random vector arising from choosing a Hadamard row uniformly at random. Then, because the Hadamard rows h i 's are proportional to an orthonormal basis and have norm √ n. This in turn implies that the coordinates h i , h j ∈ {±1} of a randomly selected Hadamard matrix row obey (2), despite not being independent instances of random signs. This feature is called pairwise independence and naturally generalizes to k ≥ 2: Definition 1 (k-wise independence). Fix k ≥ 2 and let i denote independent instances of a signed Bernoulli random variable. We call a random sign vector a ∈ {±1} n k-wise independent, if its components a 1 , . . . , a n obey Explicit constructions for k-wise independent vectors are known for any k and n. In this work we focus on particular constructions that rely on generalizing the following instructive example. Fix n = 4 and consider the rows of the following matrix:  The first two rows summarize all possible length-two combinations of ±1. The coefficients of the third row correspond to their entry-wise product. Hence, it is completely characterized by the first two. The three row vectors are not mutually independent. Nonetheless, each subset of two rows does mimic independent behavior: all possible length-two combinations of ±1 occur exactly once. This ensures that a randomly selected row is pairwise independent in the sense that its coefficients obey Eq. (2). This simple example may readily be generalized. A binary M × n orthogonal array of strength t is a sign matrix O ∈ {±1} M ×n such that every selection of t rows contains all elements of {±1} t an equal number of times.
Several different explicit constructions of orthogonal arrays are known. A simple counting argument reveals that the number of rows must obey M ≥ O(n t/2 ). This number scales polynomially in the array strength t -a potentially exponential improvement over the "full" array that lists all 2 n possible elements of {±1} n . In turn, selecting a random row of O only requires log 2 (M ) ≥ O(t log 2 (n)) random bits and produces a random vector that is t-wise independent according to Definition 1. We refer to Sec. IV and Ref. [16] for a more thorough treatment of this concept.

D. Partially derandomizing complex standard Gaussian vectors
Let us now discuss another general purpose tool for (partial) de-randomization. Concentration of measure implies that ndimensional standard complex Gaussian vectors concentrate sharply around the complex sphere √ nS n−1 of radius √ n. Hence, they behave very similarly to vectors a s ∈ C n chosen uniformly from this sphere. Such random vectors obey the following formula for any k ∈ N and any z ∈ C n : Here, dw denotes the uniform measure on the complex unit sphere S n−1 ⊂ C n . This formula characterizes even moments of this uniform distribution 2 . The concept of k-designs [17] uses this moment formula as a starting point for partial derandomization. Roughly speaking, a t-design is a finite subset of √ n-length vectors such that the uniform distribution over these vectors reproduces the uniform measure on √ nS n−1 up to k-th moments. More precisely: (Spherical) t-designs were originally developed as cubature formulas for the real-valued unit sphere [17]. The concept has since been extended to other sets. A generalization to the complex projective space CP n−1 gives rise to Definition 2. Complex projective t-designs are known to exist for any t and any dimension n, see e.g. [18], [19], [20]. However, explicit constructions for t ≥ 3 are notoriously difficult to find. In contrast, several explicit families of 2-designs have been identified. Here, we will focus on one such family. Two 2 For comparison, a complex standard Gaussian vector obeys A prominent example for such a basis pair are the standard basis and the Fourier, or Hadamard, basis, respectively. One can show that at most n + 1 different orthonormal bases exist that have this property in a pairwise fashion [21,Theorem 3.5]. Such a set of n + 1 bases is called a maximal set of mutually unbiased bases (MMUB). For instance, in n = 2 the standard basis together with forms a MMUB. Importantly, MMUBs are always (proportional to) 2-designs [22]. Explicit constructions exist for any prime power dimension n and one can ensure that the standard basis is always one of them. Here we point out one construction that is particularly simple if the dimension is (an odd) prime n ≥ 5 [23]: The standard basis vectors e 1 , . . . , e n ∈ C n together with all vectors whose entry-wise coefficients correspond to form a MMUB. Here ω n = exp 2πi n is a n-th root of unity. The parameter α ∈ [n] singles out one of the n different bases, while λ ∈ [n] labels the n corresponding basis vectors. Excluding the standard basis, this set of n 2 vectors corresponds to all time-frequency shifts of a discrete Alltop sequence

E. Main results
Theorem 1 (CS from orthogonal array measurements). Suppose that a matrix A contains m ≥ Cs log(2n) rows that are chosen independently from an orthogonal array with strength four. Then, with probability at least 1 − 2e −cm , any s-sparse x ∈ C n can be recovered from y = Ax by means of algorithm (1).

Theorem 2 (CS from time-frequency shifted Alltop sequences).
Let n ≥ 5 be prime and suppose that A contains m ≥ Cs log(2n) rows that correspond to random time-frequency shifts of the Alltop sequence (5) in dimension n. Then, with probability at least 1 − e −cm , any s-sparse x ∈ R n can be recovered from y = Ax by means of algorithm (1).
This result actually generalizes to measurements that are sampled from a maximal set of mutually unbiased bases (excluding the standard basis). Time-frequency shifts of the Alltop sequence are one concrete construction that applies to prime dimensions only.
Note that the cardinality of all Alltop shifts is n 2 . Hence, 2 log 2 (n) random bits suffice to select a random time-frequency shift. In turn, a total of 2 log 2 (n)m 2Cs log 2 (n) random bits are required for sampling a complete measurement matrix A. This number is exponentially smaller than the number of random bits required to generate a matrix with independent complex Gaussian entries. A similar comparison holds true for random signed Bernoulli matrices and columns sampled from a strength-4 orthogonal array.
Highly structured families of vectors -such as rows of a Fourier, or Hadamard matrix -require even less randomness to sample from: only log 2 (n) bits are required to select such a row uniformly at random. However, existing convergence guarantees are weaker than the main results presented here. They require an order of Cspolylog(s) log(n) random measurements to establish comparable results. Thus, the total number of random bits required for such a procedure scales like Cspolylog(s) log 2 (n). Eq. (6) still establishes a logarithmic improvement in terms of sparsity.
The recovery guarantees in Theorem 1 and 2 can be readily extended to ensure stability with respect to noise corruption in the measurements and robustness with respect to violations of the model assumption of sparsity. We refer to Sec. III for details.
We also emphasize that there are results in the literature that establish compressed sensing guarantees comparable, or even less, randomness. Obviously, deterministic constructions are the extreme case in this regard. Early results suffer from a "quadratic bottleneck". The number of measurements must scale quadratically in the sparsity: m s 2 . Although this obstacle was overcome, existing progress is still comparatively mild. Refs. [25], [26], [27] Closer in spirit to this work is Ref. [28]. There, the authors employ the Legendre symbol -which is well known for its pseudorandom behavior -to partially derandomize a signed Bernoulli matrix. In doing so, they establish uniform s-sparse recovery from m ≥ Cs log 2 (s) log(n) measurements that require an order of s log(s) log(n) random bits to generate. Compared to the main results presented here, this result gets by with less randomness, but requires more measurements. The proof technique is also very different.
To this date, the strongest de-randomized reconstruction guarantees hail from a close connection between s-sparse recovery and Johnson-Lindenstrauss embeddings [29], [30]. These have a wide range of applications in modern data science. Kane and Nelson [31] established a very strong partial de-randomization for such embeddings. This result may be used to establish uniform s-sparse recovery for m = Cs log(n/s) measurements that require an order of s log (s log(n/s) log(n/s)) random bits. This result surpasses the main results presented here in both sampling rate and randomness required.
However, this strong result follows from "reducing" the problem of s-sparse recovery to a (seemingly) very different problem: find Johnson-Lindenstrauss embeddings. Such a reduction typically does not preserve problem-specific structure. In contrast, the approach presented addresses the problem of sparse recovery directly and relies on tools from signal processing. In doing so, we maintain structural properties that are common in several applications of s-sparse recovery. Orthogonal array measurements, for instance, have ±1-entries. This is well-suited for the single pixel camera [32]. Alltop sequence constructions, on the other hand, have successfully been applied to stylized radar problems [33]. Both types of measurements also have the property that every entry has unit modulus. This is an important feature for the application of CDMA [34]. Having pointed out these high level connections, we want to emphasize that careful, problem specific adaptations may be required to rigorously exploit these. The framework developed here may serve as a guideline on how to achieve this goal in concrete scenarios.

II. PROOFS
A. Textbook-worthy proof for real-valued compressed sensing with Gaussian measurements This section is devoted to summarizing an elegant argument that is originally due to Rudelson and Vershynin [14], see also [35], [36], [37] for arguments that are similar in spirit. This argument only applies to s-sparse recovery of real-valued signals. We will generalize a similar idea to the complex case later on.
In this work we are concerned with uniform reconstruction guarantees: With high probability a single realization of the measurement matrix A allows for reconstructing any s-sparse vector x by means of 1 -regularization (1). A necessary prerequisite for uniform recovery is the demand that no s-sparse vector is contained in the kernel, or nullspace, of A. This condition is captured by the nullspace property (NSP). Define where σ s (x) = inf z 0≤s x − z 1 for x ∈ C n is the approximation error (measured in 1 -norm) one incurs when approximating x with a s-sparse vector. A matrix A obeys the NSP of order s if inf z∈Ts Az 2 > 0.
The set T s is a subset of the unit sphere that contains all normalized s-sparse vectors. This justifies the informal definition of the NSP: no s-sparse vector is an element of the nullspace of A. Importantly, the NSP is not only necessary, but also sufficient for uniform recovery, see e.g. [3,Theorem 4.5].
Hence, universal recovery of s-sparse signals readily follows from establishing Rel. (8). The nullspace property and its relation to s-sparse recovery has long been somewhat folklore. We refer to Ref. [3] for a discussion of its origin. The following powerful statement allows for exploiting generic randomness in order to establish nullspace properties. It is originally due to Gordon [38], but we utilize a more modern reformulation, see [3,Theorem 9.21].
Theorem 3 (Gordon's escape through a mesh). Let A ∈ M m×n be a real-valued standard Gaussian matrix and let E ⊆ S n be a subset of the real-valued unit sphere. Define the Gaussian width (E) = E sup z∈E a g , z , where the expectation is over realizations a g ∼ N (0, I) of a standard Gaussian random vector. Then, for t ≥ 0 the bound This is a deep statement that connects random matrix theory to geometry: the Gaussian width is a rough measure of the size of the set E ⊆ S n . Setting E = T s allows us to conclude that a matrix A encompassing m independent Gaussian measurements is very likely to obey the s-NSP (8), provided that m − 1 exceeds (T s ) 2 . In order to derive an upper bound on (T s ), we may use the following inclusion see e.g. [35,Lemma 3] and [14,Lemma 4.5]. Here, Σ n s ⊆ S n denotes the set of all s-sparse vectors with unit length. In turn, because the linear function z → a g , z achieves its maximum value at the boundary Σ n s of the convex set conv (Σ n s ). The right hand side of (9) is the expected supremum of a Gaussian process indexed by z ∈ Σ n s . Dudley's inequality [39], see also [3,Theorem 8.23], states where N (Σ n s , · 2 , u) are covering numbers associated with the set Σ n s . They are defined as the smallest cardinality of a u-covering net with respect to the Euclidean distance. A volumetric counting argument yields N (Σ n s , · 2 , u) ≤ where c is an absolute constant. This readily yields the following assertion.
Theorem 4 (NSP for Gaussian measurements). A number of m ≥ cs log(en/s) independent real-valued Gaussian measurements obeys the (real-valued) s-NSP with high probability at This argument is exemplary for generic proof techniques: strong results from probability theory allow for establishing close-to-optimal results in a relatively succinct fashion.

B. Extending the scope to subgaussian measurements
The extended arguments presented here are largely due to Dirksen, Lecue and Rauhut [36]. Again, we will focus on the real-valued case.
Gordon's escape through a mesh is only valid for Gaussian random matrices A. Novel methods are required to extend this proof technique beyond this idealized case. Comparatively recently, Mendelson provided one by generalizing Gordon's escape through a mesh [40], [41].
Theorem 5 (Mendelson's small ball method, Tropp's formulation [37]). Suppose that A is a random m × n matrix whose rows correspond to m independent realizations of a random vector a ∈ R n . Fix a set E ⊆ R n , and define is the empirical average over m independent copies of a weighted by uniformly random signs i ∼ {±1}. Then, for any t, ξ > 0 It is worthwhile to point out that for real-valued Gaussian vectors this result recovers Theorem 3 up to constants. Fix ξ > 0 of appropriate size. Then, E ⊆ S n ensures that ξQ 2ξ (a g , E) is constant. Moreover, W m (a g , E) reduces to the usual Gaussian width (E).
Mendelson's small ball method can be used to establish the nullspace property for independent random measurements a ∈ R n that exhibit subgaussian behavior: for all y ∈ R n , θ > 0.
because there are 3 possible pairings of four indices. Now, set E = T s ⊂ S n . An application of the Paley-Zygmund inequality then allows for bounding the parameter Q 2ξ (a sb , T s ) in Mendelson's small ball method from below: This lower bound is constant for any ξ ∈ (0, 1). Next, note that X z = z, h is a stochastic process that is indexed by z ∈ R n . This process is centered (EX z = 0) and Eq. (10) implies that it is also subguassian (at least for any z ∈ Σ n s ). Moreover, E |X z − X y | 2 1/2 = z − y 2 2 readily follows from (11). Unlike Gordon's escape through a mesh, Dudley's inequality does remain valid for such stochastic processes with subgaussian marginals. We can now repeat the width analysis from the previous section to obtain W m (a sb , T s ) ≤ 2E sup z∈Σ n s z, h ≤ c s log(en/s).
Fixing ξ > 0 sufficiently small, setting t =c √ m and inserting these bounds into Eq. (5) yields the following result.
Theorem 6 (NSP for signed Bernoulli measurements). A matrix A encompassing m ≥ Cs log(en/s) random signed Bernoulli measurements obeys the real-valued s-NSP with probability at least 1 − ec m .
A similar result remains valid for other classes of independent measurements with subgaussian marginals (10).

C. Generalization to complex-valued signals and partial derandomization
The nullspace property, as well as its connection to uniform s-sparse recovery readily generalizes to complex-valued ssparse vectors. A similar extension applies to Mendelson's small ball method: Theorem 7 (Mendelson's small ball method for complex vector spaces). Suppose that the rows of A correspond to m independent copies of a random vector a ∈ C n . Fix a set E ⊂ C n and define Then, for any t, ξ > 0 Such a generalization was conjectured by Tropp [37], but we are not aware of any rigorous proof in the literature. We provide one in Subsection V-B and believe that such an extension may be of independent interest. This extension allows for generalizing the arguments from the previous subsection to the complex-valued case.
Let us now turn to the main scope of this work: partial de-randomization. Effectively, Mendelson's small ball method reduces the task of establishing nullspace properties to bounding the two parameters Q 2 3/2 ξ (a, T s ) and W m (a, T s ) in an appropriate fashion. A lower bound on the former readily follows from the Paley-Zygmund inequality, provided that the random vector a obeys E | a, z | 2 = z 2 2 for all z ∈ C n (isotropy), where C 4 > 0 is a constant: In contrast, establishing an upper bound on W m (a, T s ) via Dudley's inequality requires subgaussian marginals (10) (that must not depend on the ambient dimension). This implicitly imposes stringent constraints on all moments simultaneously. An additional assumption allows to considerably weaken these demands: max 1≤k≤n | e k , a | 2 =1 almost surely (incoherence). (14) Incoherence has long been identified as a key ingredient for developing s-sparse recovery guarantees. Here, we utilize it to establish an upper bound on W m (A, T s ) that does not rely on subgaussian marginals. Lemma 1. Let a ∈ C n be a random vector that is isotropic and incoherent. Let T s ⊂ C n be the complex-valued generalization of the set defined in Eq. (7) and assume m ≥ log(2n). Then, This bound only requires an appropriate scaling of the first two moments (isotropy). However, this partial derandomization comes at a price: the bound scales logarithmically in n rather than n/s. We defer a proof of this statement to Subsection V-A below. Inserting the bounds (13) and (15) into the assertion of Theorem 7 readily yields the main technical result of this work: Theorem 8. Suppose that a ∈ C n is a random vector that obeys incoherence, isotropy and the 4th moment bound. Then, choosing m ≥ Cs log(n) instances of a uniformly at random results in a measurement matrix A that obeys the complex-valued nullspace property of order s with probability at least 1 − 2e −cm .
In complete analogy to the real-valued case, the complex nullspace property ensures uniform recovery of s-sparse vectors x ∈ C n from linear measurements of the form y = Ax via algorithm (1).

D. Recovery guarantee for strength-four orthogonal arrays
Suppose that a oa ∈ {±1} n is chosen uniformly from an orthogonal array with strength 4. By definition which establishes incoherence. Moreover, the components a i of a oa obey E [a i a j ] = E [ i j ] = δ ij , because 4-wise independence necessarily implies 2-wise independence. Isotropy readily follows: Therefore a oa meets all the requirements of Theorem 8. The first main result then readily follows from the fact that the complex nullspace property ensures uniform recovery of all s-sparse signals.

E. Recovery guarantee for mutually unbiased bases
Suppose that a mub ∈ C n is chosen uniformly from a maximal set of n mutually unbiased bases (excluding the standard basis) whose elements are re-normalized to length √ n. Random time-frequency shift of the Alltop sequence (5) is a concrete example for such a sampling procedure, provided that the dimension n ≥ 5 is an (odd) prime.
The vector a mub is chosen from a union of n bases that are all mutually unbiased with respect to the standard basis, see Eq. (4). Together with super-normalization ( a 2 = √ n) this readily establishes incoherence: max 1≤k≤n | e k , a | 2 = n n = 1 with probability one.
Next, by assumption a mub is chosen uniformly from a union of n re-scaled orthonormal bases √ nb which establishes isotropy. Finally, a maximal set of (n + 1) mutually unbiased basesincluding the standard basis which we denote by b (n+1) k = e k -forms a 2-design according to Definition 2. For any z ∈ C n this property ensures which implies the 4th moment bound. In summary, the random vector a mub ∈ C n meets the requirements of Theorem 8. Theorem 2 then readily follows form the implications of the nullspace property for s-sparse recovery.

III. EXTENSION TO NOISY MEASUREMENTS
The nullspace property may be generalized to address two imperfections in s-sparse recovery simultaneously: (i) the vector x ∈ C d may only be approximately sparse in the sense that it is well-approximated by a s-sparse vector, (ii) the measurements may be corrupted by additive noise: y = Ax + s with s ∈ C m .
To state this generalization, we need some additional notation. For z ∈ C n and 1 ≤ s ≤ n, let z s ∈ C n be the vector that only contains the s largest entries in modulus. All other entries are set to zero. Likewise, we write zs = z − z s to denote the remainder. In particular, σ s (z) = zs 1 . A m × n matrix A obeys the robust nullspace property of order s with parameters ρ ∈ (0, 1) and τ > 0 if z s 2 ≤ ρ √ s zs 1 + τ Az 2 for all z ∈ S n−1 see e.g. [3,Definition 4.21]. This extension of the nullspace property is closely related to stable s-sparse recovery from noisy measurements via basis pursuit denoising: subject to Az − y 2 ≤ η.
Here, η > 0 denotes an upper bound on the strength of the noise corruption: s 2 ≤ η. Indeed, [3,Theorem 4.22] draws the following connection: suppose that A obeys the robust nullspace property with parameters ρ, τ . Then, the solution z ∈ C n to (16) is guaranteed to obey where D 1 = (1 + ρ) 2 /(1 − ρ) and D 2 = (3 + ρ)τ /(1 − ρ). The first term on the r.h.s. vanishes if x is exactly s-sparse and remains small if x is well approximated by a s-sparse vector. The second term scales linearly in the noise bound η ≥ s 2 and vanishes in the absence of any noise corruption.
In the previous section, we have established the classical nullspace property for measurements that are chosen independently from a vector distribution that is isotropic, incoherent and obeys a bound on the 4th moments. This argument may readily be extended to establish the robust nullspace property with relatively little extra effort. To this end, define the set A moment of thought reveals that the matrix A obeys the robust nullspace property with parameters ρ, τ if inf z∈Tρ,s What is more, the following inclusion formula is also valid: Now, suppose that A subsumes m ≥ Cρ −2 s log(2n) independent copies of the random vector a ∈ C n , where C > 0 is sufficiently large. Then, Theorem 7 readily asserts inf z∈Tρ,s with probability at least 1 − 2e −cm . Previously, we employed Mendelson's small ball method to simply assert that a similar infimum is strictly positive. Eq. (19) provides a strictly positive lower bound with comparable effort. Comparing this relation to Eq. (18) highlights that this is enough to establish the robust nullspace property with parameters ρ and τ = ρ c √ m with high probability. In turn, a stable generalization of the main recovery guarantee follows from Eq. (17).
Theorem 9. Fix ρ ∈ (0, 1) and s ∈ N. Suppose that we sample m ≥ Cρ −2 s log(n) independent copies of an isotropic, incoherent random vector a ∈ C n that also obeys the 4th moment bound. Then, with probability at least 1 − 2e −cm , the resulting measurement matrix A allows for stable, uniform recovery of (approximately) s-sparse vectors. More precisely, the solution z to (16) is guaranteed to obey where D 1 , D 2 > 0 depend only on ρ.

IV. NUMERICAL EXPERIMENTS
In this part we demonstrate the performance which can be achieved with our proposed derandomized constructions and we compare this to generic measurement matrices (Gaussian, signed Bernoulli). However, since the orthogonal array construction is more involved we first provide additional details relevant for numerical experiments.

A. Details on orthogonal arrays
An orthogonal array OA(λσ t , n, σ, t) of strength t, with n factors and σ levels is an λσ t × n array of σ different symbols such that in any t columns every ordered σ t -tuple occurs in exactly λ rows. Arrays with λ = 1 are called simple. A comprehensive treatment can be found in the book [16]. Known arrays are listed in several libraries 3 . Often the symbol alphabet is not relevant, but we use the set Z σ = {0, . . . , σ −1} for concreteness. Such arrays can be represented as a matrix in Z λσ t ×n σ . For σ = q p with q prime the simple orthogonal array OA(σ t , n, σ, t) is linear if the q pt rows of the matrix form a vector space over F q . The runs of an orthogonal array (the rows of the corresponding matrix) can also be interpreted as codewords of a code and vice versa. The array is linear if and 3 for example http://neilsloane.com/oadir/ or http://pietereendebak.nl/oapage/ only if the corresponding code is linear [16,Chapter 4]. This relationship allows to employ classical code constructions to construct orthogonal arrays.

B. Counting bits
In this work we propose to generate m × n sampling matrices A by selecting m ≤ M = λσ t rows at random from an orthogonal array OA(λσ 4 , n, σ, 4), eventually removing the bias (substracting (σ − 1)/2 per component) and scale appropriately. Intuitively, m log 2 (M ) bits are then required to specify such a matrix A. For t = 4 and k = n, a classical lower bound due to Rao [42] demands Arrays that saturate this bound are called tight (or complete). In summary, an order of s log 2 (n) bits are required to sample a m × n matrix A with m ≥ Cs log(n) rows according to this procedure.

C. Strength-4 Constructions
For compressed sensing applications we want arrays with large number of factors n since this corresponds to the ambient dimension n = k of the sparse vectors to recover. On the other hand the run size M should scale "moderately" to describe the random matrices only with few bits. Most constructions use an existing orthogonal array as a seed to construct larger arrays. Known binary arrays of strength 4 are for example the simple array OA (16,5,2,4), or OA(80, 6, 2, 4). Ref. [43] proposes an algorithm that uses a linear orthogonal array OA(N, n, σ, t) as a seed to construct a linear orthogonal array OA(N 2 , n 2 + 2n, σ, t). This procedure may then be iterated.
D. Numerical results for orthogonal arrays: Figure 1 summarizes the empirical performance of basis pursuit (1) from independent orthogonal array measurements. We consider real-valued signals and quantify the performance in terms of the normalized 2 -recovery error (NMSE). To construct the orthogonal array, algorithm [43] is applied twice OA(16, 5, 2, 4) → OA(256, 35, 2, 4) → OA (65536, 1295, 2, 4). The 323 rows are uniformly sampled from this array, i.e. the sampling matrix A has ±1 entries (mapping {0, 1} → {±1}) and size 323 × 1295. Note that, in the case of non-negative sparse vectors, the corresponding 0/1-matrices may be used instead to recover with non-negative least-squares [44]. The sparsity of the unknown vector has been varied between 1 . . . 180. For each sparsity many experiments are performed to compute NMSE. In each run, the support of the unknown vector has been chosen uniformly at random and the values are independent instances of a standard Gaussian random variable. For comparison, we have also included the corresponding performances of a generic sampling matrix (signed Bernoulli) of the same size. Numerically, the partially derandomized orthogonal array construction achieves essentially the same performance as its generic counterpart.  Figure 1 shows the NMSE achieved for measurement matrices based on subsampling from an Alltop-design (5). The data is obtained in the same way as above but the sparse vectors are generated as iid. complex-normal distributed on the support. For comparison the results for a (complex) standard Gaussian sampling matrix are included as well. Again, the performance of random Alltop-design measurements essentially matches its generic (Gaussian) counterpart.

A. Proof of Lemma 1
The inclusion T s ⊂ 2conv(Σ s n ) remains valid in the complex case. Moreover, every z ∈ conv(Σ s n ) necessarily obeys max z∈conv(Σ s n ) because the maximum value of a convex function over a convex set is achieved at the boundary. Hoelder's inequality therefore implies Monotonicity and non-negativity of the exponential function then imply where we have also used that all i 's and a i 's are independent. The remaining moment generating functions can be bounded individually. Fix 1 ≤ k ≤ n, σ ∈ {±1} and 1 ≤ i ≤ m and exploit the Rademacher randomness to infer Re ( e k , a i ) 2 , because σ 2 = 1. Incoherence moreover ensures (Re( e k , a i ) 2 ≤ | e k , a i | 2 ≤ 1. This ensures that the remaining expectation value is upper-bounded by exp θ 2 2m . Inserting these individual bounds into the expression above yields for any 0 < θ ≤ √ 2m. Choosing θ = 2 log(2n) is feasible and minimizes this upper bound. A completely analogous bound can be derived for the expected maximum absolute value of the imaginary part. Combining both yields E h ∞ ≤ 2 log(2n) + 2 log(2n) = 2 2 log(2n) and inserting this bound into Eq. (21) ensures W m (a, T s ) ≤ 4 2s log(2n).

B. Proof of Theorem 7
The proof is based on rather straightforward modifications of Tropp's proof for Mendelson's small ball method [37]. Let a ∈ C n be a complex-valued random vector. Suppose that a 1 , . . . , a m ∈ C n are independent copies of a and let A be the m × n matrix whose m rows correspond to these vectors. The goal is to obtain a lower bound on inf z∈E Az 2 , where E ⊂ C n is an arbitrary, but fixed, set. First, note that 1 and 2 norms on R 2m are related via v 2 ≥ (2m) −1 v 1 . For fixed z ∈ E this ensures (|Re( a i , z )| + |Im( a i , z )|) .
Adding and subtracting ξ(m/2) 1/2 Q 2ξ (z) to Eq. (22) and taking the infimum yields Here we have applied Eq. (23) to the first term. Since Q 2ξ (z) features both a real and imaginary part and we can split up the remaining supremum accordingly. The suprema over real and complex parts individually correspond to