Isotropic and Non-Isotropic Signaling in Multivariate α-Stable Noise

A wide range of communication systems are corrupted by non-Gaussian noise, ranging from wireless to power line. In some cases, including interference in uncoordinated OFDM-based wireless networks, the noise is both impulsive and multivariate. At present, little is known about the information capacity and corresponding optimal input distributions. In this paper, we derive upper and lower bounds of the information capacity by exploiting non-isotropic inputs. For the special case of sub-Gaussian α-stable noise models, a numerical study reveals that isotropic Gaussian inputs can remain a viable choice, although the performance depends heavily on the dependence structure of the noise.


INTRODUCTION
In many communication systems, additive Gaussian noise is the dominant form of signal corruption due to thermal fluctuations in the electronic devices comprising the receiver. Nevertheless, additive non-Gaussian noise has also been observed to play an important role in power line (Zimmermann and Dostert, 2002) and molecular communications (Farsad et al., 2015). Even in wireless communications, interference from uncoordinated transmitters, such as in the Internet of Things (IoT), has been suggested to admit non-Gaussian statistics (Clavier et al., 2021b). Another form of wireless communications where non-Gaussian noise arises is in underwater communications (Chitre et al., 2004).
A particularly important family of non-Gaussian noise models are impulsive, where the probability of large amplitude noise is significantly higher than predicted by corresponding Gaussian models; that is, impulsive noise is heavy-tailed. A key property of impulsive noise is that higher-order moments are often infinite or undefined, arising in Student's t (Hall, 1966), generalized Gaussian (Dytso et al., 2018), and α-stable models (Middleton, 1977;Sousa, 1992;Ilow and Hatzinakos, 1998;Gulati et al., 2010;Pinto and Win, 2010).
Of all impulsive noise families, one of the most ubiquitous are the α-stable models. As a generalization of Gaussian models admitting the key property known as stability under convolution, these models arise via several mechanisms. The first mechanism, relevant for molecular communications, is via the distribution of the first hitting time of the standard Wiener process (Farsad et al., 2015). The second mechanism is via the generalized central limit theorem, which characterizes the behavior of partial sums of n independent and identically distributed random variables under the scaling n − 1 α (Mahmood et al., 2014). The third mechanism to obtain α-stable models relevant for interference in wireless communication systems was first identified by Middleton (Middleton, 1977) and further clarified in (Sousa, 1992;Ilow and Hatzinakos, 1998). In particular, given uncoordinated transmitting devices located according to a homogeneous Poisson point process on the plane, the interference under power-law path loss converges almost surely to an α-stable random variable by identification with the LePage series (Samorodnitsky and Taqqu, 1994). This third mechanism has recently seen application in interference studies for the IoT . Indeed both theory and recent experimental data (Lauridsen et al., 2017) in the 868 MHz band, utilized by SigFox and LoRa devices, has indicated the presence of heavytailed interference which may be modeled via α-stable models (Clavier et al., 2021b).
Despite the utility of α-stable noise models in communications, the vast majority of work has focused on real-valued noise. In this setting, information capacity bounds have been derived in (de Freitas et al., 2017) and the structure of optimal input distributions characterized in (Fahs and Abou-Faycal, 2017). The design of symbol detection strategies and their performance has been addressed in (Niranjayan and Beaulieu, 2009;Ghannudi et al., 2010;Clavier et al., 2021a) and decoding algorithms developed in (Gu and Clavier, 2012;Mestrah et al., 2020). Noise parameter estimation algorithms have also been developed in (Kuruoglu, 2001) and power control strategies in .
On the other hand, baseband signals in wireless communications are typically complex-valued for which few signal processing strategies and studies of performance analysis have been developed, with notable exceptions in (Gulati et al., 2010;Mahmood et al., 2014) for the narrowband case. The situation is further complicated when transmissions utilize orthogonal frequency division multiplexing (OFDM), where signals are transmitted over multiple subcarriers. In such cases, the noise forms a random vector and real-valued α-stable models are insufficient. Nevertheless, it has recently been shown that multivariate α-stable models can naturally arise from statistical analysis of interference in complex baseband signals over multiple subcarriers . However, little is known about performance limits or optimal signaling strategies in the presence of multivariate α-stable noise. In particular, the information capacity remains an open question in such channels, which is useful for selecting coding rates-via the noisy channel coding theorem-and in designing resource allocation strategies .
In this paper, as a step towards resolving these open questions, we study the information capacity and signaling in multivariate symmetric α-stable noise channels with 1 < α < 2. We first return to the question of the information capacity in real-valued symmetric α-stable noise channels, where we establish new upper and lower bounds that are tighter and more general than those given in (de Freitas et al., 2017). In particular, bounds are also given for power-constrained inputs as well as fractional moment constraints. In the case of a power constraint, we establish that the information capacity is within a constant of the information capacity for the Gaussian noise channel and that Gaussian inputs yield this behavior.
We then turn to the case of multivariate symmetric α-stable noise. We show that there exists a unique optimal input achieving the information capacity and also derive a general upper bound, which is applicable to all multivariate symmetric α-stable noise channels subject to fractional moment and power constraints. We then derive a general lower bound applicable for fractional moment constraints with exponent r < α. In the case of sub-Gaussian α-stable models, we also obtain a lower bound on the information capacity subject to a power constraint.
Our bounds suggest, at least from an analytical point of view, that it is desirable to match the dependence structure of the input distribution to that of the noise. Indeed, our lower bounds are obtained with non-isotropic inputs, often matched to the dependence structure of the noise distribution. To study the performance of non-isotropic inputs, we consider communication in sub-Gaussian α-stable noise subject to a power constraint, and numerically study the behavior of the bounds. In this particular case, we observe that isotropic Gaussian inputs nearly achieve the capacity upper bound, suggesting that matching the input to the dependence structure of the noise is not always desirable.

Notation
Vectors are denoted by bold lowercase letters and random vectors by bold uppercase letters, respectively (e.g., x, X). We denote the distribution of a random vector X by P X . If X, Y are two random vectors equal in distribution, then we write X d Y.
Let z ∈ R d , then z r , 1 ≤ r ≤ 2 is given by and · r is called the r-norm on R d . For two vectors a, b ∈ R n , a c b indicates that a i ≥ b i , i 1, . . . , n.
Let f: R → R and g: R → R. We use the Landau notation

PROBLEM FORMULATION
In this section, we detail the problem of characterizing the information capacity and optimal input distributions in multivariate symmetric α-stable noise channels (1 < α < 2). To this end, we first recall preliminary definitions and properties of scalar and multivariate α-stable models that will be used in the sequel. For further details, we refer the reader to (Samorodnitsky and Taqqu, 1994).

α-Stable Models
The probability density function of an α-stable random variable is described by four parameters: the exponent 0 < α ≤ 2; the scale parameter c ∈ R + ; the skew parameter β ∈ ( − 1, 1); and the shift parameter δ ∈ R. If X has an α-stable distribution, then we write X ∼ S α (c, β, δ). In the case β δ 0, X is said to be a symmetric α-stable random variable.
In general, α-stable random variables do not have closed-form probability density functions. Instead, they are more compactly represented by the characteristic function, given by (Samorodnitsky and Taqqu, 1994, Eq. 1 Frontiers in Communications and Networks | www.frontiersin.org October 2021 | Volume 2 | Article 718945 Observe that in the special case α 2, the α-stable distribution is Gaussian. As such, the family of α-stable distribution generalize the family of Gaussian distributions. In fact, like Gaussian models, if X (1) and X (2) are independent copies of an α-stable random variable X, then for a, b > 0, there exists constants More precisely, the following property holds (Samorodnitsky and Taqqu, 1994).
When β δ 0 in Eq. 2, the resulting α-stable distribution is said to be symmetric. An important alternative characterization of symmetric α-stable random variables is via the LePage series.
Theorem 1. [Theorem 1.4.2 (Samorodnitsky and Taqqu, 1994)]. Suppose 0 < α < 2, (Γ i ) ∞ i 1 is a homogeneous Poisson point process with intensity 1, and (W i ) ∞ i 1 are symmetric, independent and identically distributed random variables satisfying In the multivariate setting, we consider random vectors X in R d ∼ d > 1. Analogously to the scalar case (d 1), a random vector X ∈ R d is a symmetric α-stable random vector if for all a, b > 0 there exists c > 0 such that where X (1) and X (2) are independent copies of X.
A sufficient condition for a random vector X in R d to be a symmetric α-stable random vector is that all linear combinations of the elements of X are symmetric α-stable (Samorodnitsky and Taqqu, 1994). In general, d-dimensional symmetric α-stable random vectors can be represented via their characteristic function, given by (Samorodnitsky and Taqqu, 1994) where Γ is the unique symmetric measure on the surface of the ddimensional unit sphere.
In the case that a d-dimensional α-stable random vector X is truly d-dimensional, there exists a joint probability density function p X (·) on R d . Note that a simple necessary and sufficient condition for X to be truly d-dimensional is for the support of the spectral measure to span R d (Byczkowski et al., 1993). This condition means that degenerate α-stable random vectors (e.g., when X i X j for some i ≠ j, i, j ∈ {1, . . . , d}) are not considered.
A key family of truly α-stable random vectors are the sub-Gaussian α-stable random vectors, defined as follows. and , then X is said to be an isotropic sub-Gaussian α-stable random vector.

The Information Capacity Problem
Consider the memoryless, stationary, linear and point-to-point communication channel where N is a truly symmetric α-stable random vector with 1 < α < 2, admitting a multivariate probability density function p N (·), with X and N independent. The random vector X is defined on 1 for a given P ∈ R d + and 1 ≤ r ≤ 2. As such, the set Λ X (P, r) corresponds to the set of inputs satisfying element-wise moment constraints. Note that by virtue of N admitting a probability density function, Y also admits a probability density function p Y (·).
The main focus of this paper is to investigate the information capacity and corresponding optimal inputs for communication channels of the form Eq. 10. To this end, let P(R d ) be the set of probability measures on (R d , B(R d )) equipped with the topology of weak convergence (Billingsley, 1999).
where the mutual information I(X; Y) is given by In the case an optimal input exists, it satisfies μ * X ∈ arg max μ X ∈ΛX(P,r) Note that, by a generalization of Shannon's noisy channel coding theorem for vector non-Gaussian channels (Han, 2003), the information capacity may be interpreted as the maximum achievable rate with asymptotically zero average probability of error.
In the remainder of this paper, we address the following questions: (i) What is the value of C(P, r) for varying P and r? (ii) Does an optimal input μ * X ∈ Λ X (P, r) exist? (iii) What is the structure of nearly optimal inputs?
In this work, we will allow μ X to be non-isotropic; that is, for all where X has probability measure μ X . In the following section, we begin with scalar channels-which have not previously been comprehensively studied-before considering more general vector channels in Section 4.

SCALAR CHANNELS
Before turning to multivariate α-stable noise channels, we first consider the scalar case. In particular, we first improve on the capacity bounds in (de Freitas et al., 2017) and in the process develop techniques that will be generalized to the multivariate setting in the sequel. To begin, we specialize the problem in Eq. 12 to the scalar case: a stationary and memoryless scalar additive symmetric α-stable noise channel is given by where the noise N is a symmetric α-stable random variable with scale parameter c N , admiting a probability density function p N (·), with X and N independent. The input random variable X is required to satisfy the constraint where 1 ≤ r ≤ 2. In terms of the probability measure of X, the constraint can be written as In this case, the information capacity of the channel (16) is defined as C(P) sup μ∈ΛX(P,r) It follows from (Fahs and Abou-Faycal, 2016) that an optimal solution of Eq. 19 exists and is unique. Indeed, the optimal input is known to be discrete (Fahs and Abou-Faycal, 2017).

Capacity Upper Bounds
In (de Freitas et al., 2017), an upper bound on C(P, r) was established when r 1 and 1 < α < 2.
Theorem 2. Let λ > 0 and r 1. For the channel (16), the capacity C(P, 1) in (19) is upper bounded by It was shown in (de Freitas et al., 2017) that this bound was tight for moderate values of P and appropriate values of λ, but quickly diverged. An asymptotic upper bound, that is, the upper bound is only guaranteed to hold as P → ∞, was established. In the following theorem, we establish an upper bound which holds for all 1 ≤ r ≤ 2 and p > 0.
Theorem 3. Let 1 ≤ r ≤ 2 and p > 0. For the channel (16), the capacity C(P, r) in (19) is upper bounded by C(P, r) ≤ C UB (P, r) Proof. Note that under the constraint that E |Y| c Y , c Y > 0, the entropy is maximized by the Laplace distribution (Cover and Thomas, 2006). This yields a bound of By the triangle inequality, We also have by (Zolotarev, 1957) All that remains is to obtain E |X| r . By construction, E |X| r ≤ P. Using Hölder's inequality then yields Substituting Eqs 23-25 into Eq. 22 gives as required.

Capacity Lower Bounds
We now turn to lower bounding C(P, r). We first consider the case where 1 ≤ r < α.
Theorem 4. Let 1 ≤ r < α. For the channel (16), the capacity C(P, r) in (19) is lower bounded by C(P, r) ≥ C LB (P, r) Proof. Let X ∼ S α (c X , 0, 0) with c X ∈ R + . Consider the random variable U ∼ S α (1, 0, 0). By the scaling and translation properties of α-stable random variables, we can write By the stability property and hence where We then have Using (Shao and Nikias, 1993, Theorem 4) and the constraint E[|X| r ] ≤ P, it follows that Remark 1. When r 1, Theorem 4 specializes to the lower bound in (de Freitas et al., 2017).
Since 1 ≤ r < α and α < 2, it follows that Theorem 4 does not apply in the important case where the input X is constrained to satisfy E[X 2 ] ≤ P. In the following theorem, we establish a lower bound in this setting.
A key question is whether the capacity bounds we have established so far are tight. To this end, we make the following observation.
Proof. Observe that C LB (P, r) 1 r log P + O(1), P → ∞, Since the corollary follows. By the same argument, we also have the following corollary.
As a consequence, for sufficiently large values of P and r 2, the rate achievable using a Gaussian input is within a constant of the capacity C(P, 2). A further observation, which will be useful in the sequel, is that matching the input distribution to the noise distribution yields a rate that forms a good approximation of the capacity. Finally, the capacity of symmetric α-stable noise channels is within a constant of the capacity for an additive Gaussian noise channel.

Numerical Results
In order to further verify the tightness of the upper bound in Theorem 3, we compare the bound with the numerical computation of the capacity via the Blahut-Arimoto algorithm (Blahut, 1972;Arimoto, 1972). Figure 1 plots the power against the information capacity for varying and α. The scale parameter is set as c N 0.01. Observe that the upper bound and the numerical approximation are in good agreement. Note that the lower bound is obtained based on a Gaussian input is also in good agreement, despite the fact that the optimal input is discrete (Fahs and Abou-Faycal, 2017). Figure 2 plots the capacity upper bounds in Theorem 2 [from (de Freitas et al., 2017)] and our new upper bound in Theorem 3 in the case of the constraint E[|X|] ≤ P with c N 0.01 and α 1.8. The parameter λ required in the bound from Theorem 2 corresponds to λ 1. Observe that for all plotted values of P, the new bound in Theorem 3 is below that of Theorem 2, implying that the new bound is tighter. Note that the improvement over the bound in (de Freitas et al., 2017) is already evident from the form of the bounds for large P, due to the fact that P dominates log P.

VECTOR CHANNELS
In this section, we return to the general problem in Eq. 12 for vector channels with d > 1.

Existence and Uniqueness of Optimal Inputs
While existence and uniqueness of optimal inputs is well understood in the scalar case (Fahs and Abou-Faycal, 2016), it has not yet been established in the vector case. We prove this result in the following theorem by utilizing the theory of weak convergence (Billingsley, 1999).
Theorem 6. For the optimization problem in (12), there exists a unique input probability measure μ* corresponding to an input random vector X* on (R d , B d ) such that C(P, r) I(X*; Y).
Proof. The proof proceeds in three steps: (i) weak compactness of the constraint set Λ X (P, r); (ii) weak continuity of I(X; Y) on Λ X (P, r), yielding existence of μ * X ; and (iii) uniqueness of μ * X .
To establish closure, we apply a variation of the Portmanteau theorem (Billingsley, 1999). Let {μ n } ∞ n 1 be a weakly convergent sequence in Λ X (P, r) with limit μ 0 . By a consequence of the Portmanteau theorem, it follows that Hence, μ 0 ∈ Λ X (P, r). Since the choice of sequence is arbitrary, it follows that Λ X (P, r) is closed in the topology of weak FIGURE 1 | Capacity of symmetric α-stable noise channels subject to a power constraint P, with c N 0.01. convergence. Since Λ X (P, r) is tight and closed in the topology of weak convergence, it then follows by Prokhorov's theorem (Billingsley, 1999) that Λ X (P, r) is compact.
(ii) The second step is to establish that I(X; Y) is weakly continuous on Λ X (P, r). In particular, we need to show that for any weakly convergent sequence of probability measures (μ n ) ∞ n 1 with limit μ 0 lim n→∞ − p Yn (y)logp Yn (y)dy − p Y0 (y)logp Y0 (y)dy, (45) where Y n is the output corresponding to an input X n with probability measure μ n . Note that Y n X n + N admits a probability density function since N is truly d-dimensional.
Observe that if the limit and the integral in Eq. 45 can be swapped, the result follows from the definition of weak convergence if the probability density function of N, p N , is bounded and continuous. Note that this is indeed the case since the characteristic function of N, To complete the proof, we must justify swapping of the limit and integral in Eq. 45. Let 1 < s < α. We need to establish that for all n ≥ 0 and any δ > 0, there exists R(δ) > 0 such that To proceed, let which is a Cauchy density on R d . Observe that where the last term follows from the fact that a log a ≥ − 1 e , a > 0. Note that by the Markov inequality, which tends to zero as R(δ) → ∞. Here, L < ∞ since by the Jensen and Hölder inequalities since the probability measure μ corresponding to X lies in Λ X (P, r). Similarly, 1 e y s > R(δ) q(y)dy which tends to zero as R(δ) → ∞. Moreover, Note that by Hölder's inequality. As such, Eq. 53 is finite and tends to zero as R(δ) → ∞.
After an application of the dominated convergence theorem, for any δ > 0 Since the identities in Eqs 49, 52, 53, 55 hold for all δ > 0, weak continuity of I(X; Y) follows by taking δ → 0 (and hence R(δ) → ∞). The existence part of Theorem 6 then holds by applying the extreme value theorem.
(iii) The uniqueness of the optimal input follows from the fact that the entropy h(Y) is a strictly concave function of the distribution P Y . By the fact that the characteristic function of N is strictly positive, P Y is a one-to-one function of the distribution P X . Hence, h(Y) is a strictly concave function of P X . As the mutual information can be written as it follows that I(X; Y) is a strictly concave function of P X since h(N) does not depend on P X . Since this holds for any input lying in Λ(P, r), it then follows that the optimal input probability measure μ * X is unique.

Capacity Upper Bound
We now obtain a general upper bound on the capacity in multivariate α-stable noise, which holds for constraints with 1 ≤ r ≤ 2.
Theorem 7. Let 1 < α < 2, 1 ≤ r ≤ 2, Pc0 and c Ni be the scale parameter for the ith element of N. The capacity C(P, r) in (12) is upper bounded by Proof. Recall that For each term h(Y i ), the same argument as Theorem 3 yields The result then follows since for all X with probability measure μ X ∈ Λ(P, r), As for the scalar case, the term h(N) is not available in closed-form and must be numerically evaluated. In the numerical study in Section 4.4, h(N) will be estimated via nearest neighbor methods.

Capacity Lower Bounds
We now generalize the results in Section 3.2 to the case of vector channels. As for scalar channels, we consider the two cases: 1 ≤ r < α; and r 2.
Theorem 8. Let 1 < α < 2, 1 ≤ r < α and Pc0. The capacity C(P, r) in (12) is lower bounded by C(P, r) ≥ C LB (P, r) where and set X d AN. We then have In order to ensure that μ X ∈ Λ X (P, r), we recall (Shao and Nikias, 1993, Theorem 4) It then follows that as required.

Numerical Results
In this section, we study the behavior of the bounds in the case of two-dimensional sub-Gaussian α-stable noise, where inputs are subject to a power constraint. Figure 3 plots the capacity bounds in the previous section for varying values of P in the presence of sub-Gaussian α-stable noise, with α 1.2 and Σ 2 · 0.01 2 · 1 0.7 0.7 1 .
In order to compute the entropy h(N), the 1-nearest neighbor method (Berrett et al., 2019) was used. Observe there is roughly a gap of approximately one nat between the upper bound in Theorem 7 and the lower bound in Theorem 9 with Σ X chosen to proportional to Σ. The third curve in red corresponds to the case of a twodimensional isotropic Gaussian input where each component has variance P. Observe that the mutual information obtained with this input is close to the upper bound. This suggests that for sub-Gaussian α-stable noise channels, Gaussian inputs perform well and, moreover, independent components are desirable. This can be understood by an inspection of Theorem 9, where choosing Σ X to be diagonal maximizes the determinant when Σ 0. Figure 4 plots the capacity bounds in the previous section subject to a power constraint p 0.01 for varying values of ρ in the presence of sub-Gaussian α-stable noise, with α 1.2 and Σ 2 · 0.01 2 · 1 ρ ρ 1 .
The results are consistent with Figure 3, with the isotropic Gaussian input performing well for all values of ρ. We also observe that the curves also increase for sufficient large values of ρ, suggesting that increasing the dependence can lead to performance improvements. This is relevant for communication systems, such as in (Zheng et al., 2019(Zheng et al., , 2020, where noise is dominated by interference, which may be modified via changes to access policies.

CONCLUSION
Multivariate α-stable models have been suggested to capture the heavy-tailed nature of interference in OFDM-based wireless communication systems. In this paper, we studied the capacity of fractional moment and power constrained signaling in the presence of such noise. By considering non-isotropic inputs, we obtained upper and lower bounds, which provide insights into the behavior of the capacity and its relation to Gaussian noise models. Via a numerical study in two-dimensional channels with sub-Gaussian α-stable noise, we compared the performance of isotropic and non-isotropic Gaussian inputs. This suggests, at least for this special case, isotropic Gaussian inputs remain a desirable choice. FIGURE 3 | Capacity bounds for two-dimensional sub-Gaussian α-stable noise channels subject to a power constraint P, with Σ given in Eq. 72 and α 1.2.
FIGURE 4 | Capacity bounds for two-dimensional sub-Gaussian α-stable noise channels for varying noise dependence ρ subject to a power constraint p 0.01, with Σ given in Eq. 73 and α 1.2.