Numerical simulations of scattering of light from two-dimensional surfaces using the Reduced Rayleigh Equation

A formalism is introduced for the non-perturbative, purely numerical, solution of the reduced Rayleigh equation for the scattering of light from two-dimensional penetrable rough surfaces. As an example, we apply this formalism to study the scattering of p- or s-polarized light from two- dimensional dielectric or metallic randomly rough surfaces by calculating the full angular distribution of the co- and cross-polarized intensity of the scattered light. In particular, we present calculations of the mean differential reflection coefficient for glass and silver surfaces characterized by (isotropic or anisotropic) Gaussian and cylindrical power spectra. The proposed method is found, within the validity of the Rayleigh hypothesis, to give reliable results. For a non-absorbing metal surface the conservation of energy was explicitly checked, and found to be satisfied to within 0.03%, or better, for the parameters assumed. This testifies to the accuracy of the approach and a satisfactory discretization.


I. INTRODUCTION
Wave scattering from rough surfaces is an old discipline which keeps attracting a great deal of attention from the scientific and technological community. Several important technologies in our society rely on such knowledge, with radar being a prime example. In the past, the interaction of light with rough surfaces was often considered an extra complication that had to be taken into account in order to properly interpret or invert scattering data. However, with the advent of nanotechnology, rough structures can be used to design novel materials with tailored optical properties. Examples include: metamaterials [1,2], photonic crystals [3], spoof plasmons [4], optical cloaking [5][6][7], and designer surfaces [8,9]. These developments have made it even more important to have available efficient and accurate simulation tools to calculate both the far-and near-field behavior of the scattered and transmitted fields for any frequency of the incident radiation, including potential resonance frequencies of the structure.
Lord Rayleigh was the first to perform systematic studies of wave scattering from rough surfaces when, in the late 1800s, he studied the intensity distribution of a wave scattered from a sinusoidal surface [10,11]. More than three decades later, Mandel'shtam studied light scattering from randomly rough surfaces [12] thereby initiating the field of wave scattering from surface disordered systems. Since the initial publication of these seminal works, numerous studies on wave scattering from randomly rough surfaces have appeared in the literature [13][14][15][16][17][18][19], and several new multiple scattering phenomena have been predicted and confirmed experimentally. These phenomena include the enhanced backscattering and enhanced transmission phenomena, the satellite peak phenomenon, and coherent effects in the intensity-intensity correlation functions [19][20][21][22][23][24].
These studies, and the methods they use, can be categorized as either perturbative or purely numerical (and non-pertubative). While the former group of methods is mainly limited to weakly rough surfaces, and therefore have limited applicability, the latter group of methods can be applied to a wider class of surface roughnesses. Rigorous numerical methods can in principle be used to study the wave scattering from surfaces of any degree of surface roughness. Such simulations are routinely performed for systems where the interface has a one-dimensional roughness, i.e., where the surface structure is constant along one of the two directions of the mean plane [19,25]. However, for the practically more relevant situation of a two-dimensional rough surface, the purely numerical and rigorous methods are presently less used due to their computationally intensive nature. The reason for this complexity is the fact that for a randomly rough surface there is no symmetry or periodicity in the surface structure that can be used to effectively reduce the simulation domain. For a periodic surface, it is sufficient to simulate a single unit cell, while for a random surface the unit cell is in principle infinite.
The FDTD and FEM methods discretize the whole volume of the simulation domain. Due to the complex and irregular shape of a (randomly) rough surface, it is often more convenient, and may give more accurate re- The main aim of this paper is to present a numerical method and formalism for the solution of the twodimensional reduced Rayleigh equation for reflection. While we exclusively consider reflection, the formalism for transmission will be almost identical, and the resulting equation will have a similar form as for reflection. Additionally, the equation for transmission or reflection for a film geometry, i.e., for a film of finite thickness on top of a substrate, where only one interface is rough, will also have a similar form. The method presented will be illustrated by applying it to the study of the scattering of p-or s-polarized light from two-dimensional metallic or dielectric media separated from vacuum by an isotropic or anisotropic randomly rough surface. This paper is organized as follows: First, in Sec. II we present the scattering geometry to be considered. We will then present some relevant scattering theory, including the reduced Rayleigh equation for the geometry under study (Sec. III), followed by a detailed description of how the equation can be solved numerically (Sec. IV). Next, we will present some simulation results obtained by the introduced method (Sec. V). We then discuss some of the computational challenges of this method (Sec. VI), and, finally, in Sec. VII we draw some conclusions. 1. (Color online) A sketch of the scattering geometry assumed in this work. The figure also shows the coordinate system used, angles of incidence (θ0, φ0) and scattering (θs, φs), and the corresponding lateral wavevectors k and q , respectively.

II. SCATTERING GEOMETRY
We consider a system where a rough surface separates two regions. Region 1 is assumed to be vacuum (ε 1 = 1), and region 2 is filled with a metal or dielectric characterized by a complex dielectric function ε 2 (ω), where the angular frequency is ω = 2πc/λ, with λ being the wavelength of the incident light in vacuum and c the speed of light in vacuum. The height of the surface measured in the positive x 3 direction from the x 1 x 2 -plane is given by the single-valued function x 3 = ζ(x ), where x = (x 1 , x 2 , 0), which is assumed to be at least once differentiable with respect to x 1 and x 2 . Angles of incidence (θ 0 , φ 0 ) and scattering (θ s , φ s ) are defined positive according to the convention given in Fig. 1.
In principle, the theory to be presented in Sec. III can be used to calculate the scattering of light from any surface, provided it is not too rough. However, in this paper, we will consider randomly rough surfaces where ζ(x ) constitutes a stationary random process defined by where the angle brackets denote an average over an ensamble of surface realizations. In writing Eqs. (1) we have defined the root-mean-square height of the surface, δ = ζ 2 (x ) 1/2 , and W (x − x ) denotes the heightheight auto-correlation function of the surface, normalized so that W (0) = 1 [19]. According to the Wiener-Khinchin theorem [47], the power spectrum of the surface profile function is given by The power spectra that will be considered in this work are of either the Gaussian form [33] where a i (i = 1, 2) denotes the lateral correlation length for direction i, or the cylindrical form [38] g(k ) = 4π where k = |k |, θ denotes the Heaviside unit step function, and k ± are wavenumber cutoff parameters, with k − < k + . The cylindrical form in Eq. (4) is a two-dimensional generalization of the power spectrum used in the experiments where West and O'Donnell confirmed the existence of the enhanced backscattering phenomenon for weakly rough surfaces [21].

III. SCATTERING THEORY
We consider a linearly p-or s-polarized plane wave which is incident on the surface from region 1, with the electric field given by and Here, and in the rest of the paper, a caret over a vector indicates a unit vector. The expressions in front of the amplitudes E inc α (k ) (α = p, s) in Eq. (5b) correspond to unit polarization vectors for incident light of linear polarization α. Moreover, k = (k 1 , k 2 , 0) denotes the lateral component of the wave vector k = k − α(k )x 3 . When the lateral wavenumber satisfies k ≤ ω/c, as will be assumed here, k is related to the angles of incidence according to where c denotes the speed of light in vacuum and θ 0 and φ 0 are the polar and azimuthal angles of incidence, respectively ( Fig. 1). When writing the field of incidence, E inc (x; t), a time harmonic dependence of the form exp(−iωt) was assumed. A similar time dependence will be assumed for all field expressions, but not indicted explicitly.
Above the surface roughness region, i.e., for x 3 > max ζ(x ), the scattered field can be written as a superposition of upwards propagating reflected plane waves: The integration in Eq. (7a) is over the entire plane, including the evanescent region q > ω/c. Therefore, both propagating and evanescent modes are included in E sc (x|ω). We will assume that a linear relationship exists between the amplitudes of the incident and the scattered fields, and we write (for α = p, s) Here we have introduced the so-called scattering amplitude R αβ (q |k ), which describes how incident βpolarized light characterized by a lateral wave vector k is converted by the surface roughness into scattered light of polarization α and lateral wave vector q . When q ≤ ω/c, the wave vector q is related to the angles of scattering (θ s , φ s ) by q = ω c sin θ s (cos φ s , sin φ s , 0) .
Below the surface region, i.e., for x 3 < min ζ(x ), the transmitted electric field can be written as In writing Eqs. (10) we have introduced wave vectors of the transmitted field p = p − α 2 (p )x 3 , where In complete analogy to what was done for reflection, a transmission amplitude T αβ (p |k ) may be defined via the following linear relation between the amplitudes of the incident and transmitted fields (α = p, s) Since the form of the electric fields given by Eqs. (5), (7), and (10) apply far away from the surface region, they are referred to as the asymptotic forms of the electric field. These equations automatically satisfy the boundary conditions at infinity.
In passing we note that once the incident field has been specified, the scattered and transmitted fields are fully specified outside the surface roughness region if the reflection [R αβ (q |k )] and transmission [T αβ (p |k )] amplitudes are known. We will now address how the reflection amplitude can be calculated.

A. The Rayleigh Hypothesis
Above the surface, i.e., in the region x 3 > max ζ(x ), the total electric field is equal to the sum of the incident and the scattered field, E inc (x|ω) + E sc (x|ω). Below the surface, in the region x 3 < min ζ(x ), it equals the transmitted field, E tr (x|ω). In the surface roughness region, min ζ(x ) ≤ x 3 ≤ max ζ(x ), these forms of the total field will not generally be valid. In particular, when we are above the surface but still below its maximum point, i.e., ζ(x ) ≤ x 3 < max ζ(x ), the expression for the scattered field will also have terms containing exp iq · x − iα 1 (q )x 3 . Similarly, the transmitted field in the surface region has to contain an additional term similar to Eq. (10a) but with the exponential function replaced by exp iq · x + iα 2 (q )x 3 (and a different amplitude).
If the surface roughness is sufficiently weak, however, the asymptotic form of the fields, Eqs. (5), (7), and (10), can be assumed to be a good approximation to the total electric field in the surface roughness region. This assumption is known as the Rayleigh hypothesis [10,11,17], in honor of Lord Rayleigh, who first used it in his seminal studies of wave scattering from sinusoidal surfaces [10,11]. For a (one-dimensional) sinusoidal surface, x 3 = ζ 0 sin(Λx 1 ), the criterion for the validity of the Rayleigh hypothesis, and thus equations that can be derived from it (like the reduced Rayleigh equation to be introduced below), is known to be ζ 0 Λ < 0.448, independent of the wavelength of the incident light [48,49]. For a randomly rough surface, however, the absolute limit of validity of this hypothesis is not generally known, though some numerical studies have been devoted to finding the region of validity for random surfaces [50]. Even if no absolute criterion for the validity of the Rayleigh hypothesis for randomly rough surfaces is known, it remains true that it is a small-slope hypothesis. In particular, if the randomly rough surface is characterized by an rms height δ, and a correlation length a (see Sec. II and Ref. [19] for details), there seems to be a consensus in the literature on the Rayleigh hypothesis being valid if δ/a 1 [17,50]. We stress that the validity of the Rayleigh hypothesis does not require the amplitude of the surface roughness to be small, only its slope.

B. The Reduced Rayleigh Equations
Under the assumption that the Rayleigh hypothesis is valid, the total electric field in the surface region, min ζ(x ) < x 3 < max ζ(x ), can be written in the form given by Eqs. (5), (7) and (10) [with Eqs. (8) and (12)]. Hence, these asymptotic fields can be used to satisfy the usual boundary conditions on the electromagnetic field at the rough surface x 3 = ζ(x ) [51,52]. In this way, one obtains the so-called Rayleigh equations, a set of coupled inhomogeneous integral equations, which the reflection and transmission amplitudes should satisfy.
In the mid-1980s, it was demonstrated by Brown et al. [34] that either the reflection or transmission amplitude could be eliminated from the Rayleigh equations, resulting in an integral equation for the remaining amplitude only. Since this latter integral equation contains only the field above (below) the rough surface, it has been termed the reduced Rayleigh equation for reflection (transmission). Subsequently, reduced Rayleigh equations for two-dimensional film geometries, i.e., a film of finite thickness on top of an infinitely thick substrate, where only one interface is rough, was derived by Soubret et al. [39,40] and Leskova [43,44]. Moreover, reduced Rayleigh equations for reflection from clean, perfectly conducting, two-dimensional randomly rough surfaces [53] and reduced Rayleigh equations for transmission through clean, penetrable two-dimensional surfaces [42] have been derived.
For the purposes of the present study, we limit ourselves to a scattering system consisting of a clean, penetrable, two-dimensional rough surface x 3 = ζ(x ) (Sec. II). If the scattering amplitudes are organized as the 2 × 2 matrix the reduced Rayleigh equation (for reflection) for this geometry can be written in the form [38][39][40] where and where the integrals in Eqs. (14a) and (14b) are over the entire q -plane and x -plane, respectively. Reduced Rayleigh equations for transmission, or film geometries with only one rough interface, will have a similar structure to Eq. (14) [39,40], and can be solved in a completely analogous fashion. It should be mentioned that the reduced Rayleigh equation can serve as a starting point for most, if not all, perturbation theoretical approaches to the study of scattering from rough surfaces [19]. For example, McGurn and Maradudin studied the scattering of light from two-dimensional rough surfaces based on the reduced Rayleigh equation, going to fourth order in the expansion in the surface profile function, and demonstrating the presence of enhanced backscattering [38].

C. Mean Differential Reflection Coefficient
The solution of the reduced Rayleigh equation determines the scattering amplitudes R αβ (q |k ). While this quantity completely specifies the total field in the region above the surface, it is not directly measurable in experiments. A more useful quantity is the mean differential reflection coefficient (DRC), which is defined as the timeaveraged fraction of the incident power scattered into the solid angle dΩ s about the scattering direction q. The mean DRC is defined as [38] where L 2 is the area of the plane x 3 = 0 covered by the rough surface. In this work, we are mainly interested in diffuse (incoherent) scattering. Since we consider weakly rough surfaces, the specular (coherent) scattering will dominate, and it will be convenient to separate the mean DRC into its coherent and incoherent parts. By coherent scattering, we mean the part of the scattered light which does not cancel when the ensemble average of R αβ is taken, i.e., the part where the scattered field is in phase between surface realizations. Conversely, the incoherent part is the part which cancels in the ensemble average. The component of the mean DRC from inco-herent scattering is [38] ∂R αβ The contribution to the mean DRC from the coherently scattered light is given by the difference between Eqs. (15) and (16).

D. Conservation of Energy
As a way to check the accuracy of our results, it is useful to investigate energy conservation. If we consider a metallic substrate with no absorption, the reflected power should be equal to the incident power. The fraction of the incident light of polarization β which is scattered into polarization α is given by the integral of the corresponding mean DRC over the upper hemisphere: For a non-absorbing metal, if we send in light of polarization β, we should have if energy is conserved. While the conservation of energy is useful as a relatively simple test, it is important to note that it is a necessary, but not sufficient, condition for correct results.

IV. NUMERICAL SOLUTION OF THE REDUCED RAYLEIGH EQUATION
The starting point for the numerical solution of the reduced Rayleigh equation is a discretely sampled surface, from which we wish to calculate the reflection. We will limit our discussion to quadratic surfaces of size L × L, sampled on a quadratic grid of N x × N x points with a grid constant 6 In this paper, we will present results for numerically generated random surfaces. These are generated by what is known as the Fourier filtering method. Briefly, it consists of generating uncorrelated random numbers with a Gaussian distribution, transforming them to Fourier space, filtering them with the square root of the surface power spectrum g(k ), and transforming them back to real space. The interested reader is referred to, e.g., Refs. [25,33].
The next step towards the numerical solution of the reduced Rayleigh equation is the evaluation of the integrals I(γ|Q ) defined in Eq. (14b). These integrals are so-called Fourier integrals and care should be taken when evaluating them due to the oscillating integrands [54]. Using direct numerical integration routines for their evaluation will typically result in inaccurate results. Instead, a (fast) Fourier transform technique with end point corrections may be adapted for their evaluation, and the details of the method is outlined in Ref. [54]. However, these calculations are time consuming, since I(γ|Q ) must be evaluated for all values of the arguments γ = α 1 (p ) − α 2 (q ) and γ = α 1 (p ) − α 2 (k ) [55].
Instead, a computationally more efficient way of evaluating I(γ|Q ) is to assume that the exponential function exp −iγζ(x ) , present in the definition of I(γ|Q ), can be expanded in powers of the surface profile function, and then evaluating the resulting expression term-by-term by Fourier transform. This gives whereζ (n) (Q ) denotes the Fourier transform of the nth power of the profile function, i.e., In practice, the sum in Eq. (20a) will be truncated at a finite value n = J, and the Fourier transforms are calculated using a fast Fourier transform (FFT) algorithm. The advantage of using Eqs. (20) for calculating I(γ|Q ), rather than the method of Ref. [54], is that the Fourier transform of each power of ζ(x ) can be performed once, and changing the argument γ in I(γ|Q ) will not require additional Fourier transforms to be evaluated. This results in a significant reduction in computational time. The same method has previously been applied successfully to the numerical solution of the onedimensional reduced Rayleigh equation [35][36][37].
It should be noted that the Taylor expansion used to arrive at Eq. (20) requires that γζ(x ) 1 to converge reasonably fast, putting additional constraints on the amplitude of the surface roughness which may be more restrictive than those introduced by the Rayleigh hypothesis. Hence, surfaces exist for which the Rayleigh hypothesis is satisfied, but the above expansion method will not converge, and the more time-consuming approach of Ref. [54] will have to be applied.
Next, we need to truncate and discretize the integral over q in Eq. (14a). We discretize q on a grid of equidistant points, with spacing ∆q, such that where i, j = 0, 1, 2, . . . , N q − 1, and Q = ∆q(N q − 1).
Here, N q denotes the number of points along each axis of the grid. Additionally, we limit the integration over q to the region q ≤ Q/2. The choice of a circular integration domain reduces the computational cost, and will be discussed in more detail in Sec. VI. Converting the integral into a sum by using a two-dimensional version of the standard mid-point quadrature scheme, we get the equation: Here, the sum is to be taken over all q ij such that q ij ≤ Q/2, where q ij = q ij . This sum yields a matrix equation where the unknowns are the four components of R(q ij |k ). It is evident from Eq. (8) that if we consider incident light of either p or s polarization, we need only calculate two of the components of the scattering amplitude to fully specify the reflected field. Hence, we solve separately for either p-polarized incident light, i.e., R pp and R sp , or s-polarized incident light, i.e., R ss and R ps . In either case, we have twice as many unknowns as the number of values of q ij included in the sum in Eq. (22). Note that the left hand side of the equation system is the same for both incident polarizations, and will also remain the same for all angles of incidence, as k only enters at the right hand side of Eq. (22).
In order to solve for all unknowns, we need to discretize p as well, to obtain a closed set of linear equations. Using the same grid for p as for q will give us the necessary number of equations, as Eq.(22) yields two equations for each value of p . Since we integrate over a circular q domain, with q discretized on a quadratic grid, the exact number of values of q ij will depend on the particular values of Q and N q , but will be approximately (π/4)N 2 q . In order to take advantage of the method for calculating I(γ|Q ) described by Eq. (20), it is essential that all possible values of p − q and p − k [see Eq. (22)] fall on the grid of wave vectors Q resolved by the Fourier transform of the surface profile we used in that calculation. First, we note that when p and q are discretized on the same quadratic grid, the number of possible values for each component of p − q will always be an odd number, 2N q − 1, where N q is the number of possible values for each component of p and q . Thus, by choosing N q such that 2N q − 1 equals the number of elements along each axis of the FFT of the surface profile we used to calculate the integrals in Eq. (20b), we ensure that the required number of points is resolved by the FFT [56]. Hence, we choose where x is the floor function of x, which is equal to the largest integer less than or equal to x. Next, we let ∆q equal the resolution of the FFT [54], i.e., and we let Q be equal to the highest wavenumber resolved by the FFT [54], In the end, we get the equation where q ij , as well as p kl and k mn , are defined on the grid given by Eq. (21), with i, j, k, l, m, n = 0, 1, 2, . . . , N q − 1, and where N q , ∆q and Q are given by Eqs. (23), (24) and (25), respectively. Evaluating Eq. (26) for all values of p kl satisfying p kl ≤ Q/2, and assuming one value of k mn , such that k mn < ω/c, and one incident polarization β, results in a closed system of linear equations in R αβ (q ij |k mn ) where α = p, s. Repeating the procedure for both incident polarizations allows us to obtain all four components of R(q ij |k mn ).
With the reflection amplitudes R αβ (q ij |k mn ) available, the contribution to the mean differential reflection coefficient from the light that has been scattered incoherently is obtained from Eq. (16) after averaging over an ensemble of surface realizations.
In passing we note that to avoid loss of numerical precision by operating on numbers with widely different orders of magnitude, we have rescaled all quantities in our problem to dimensionless numbers. When considering an incoming wave of wavelength λ, angular frequency ω, and wave vector k, we have chosen to rescale all lengths in our problem by multiplying with ω/c, and all wavenumbers by multiplying with c/ω, effectively measuring all lengths in units of λ/2π, and the magnitude of wave vectors in units of ω/c.

V. RESULTS
To demonstrate the use of the formalism for solving the reduced Rayleigh equation, the first set of calculations we carried out was for two-dimensional randomly rough silver surfaces. The surface roughness was characterized by an rms height of δ = 0.025λ and an isotropic Gaussian power spectrum [Eq. (3)] of correlation lengths a 1 = a 2 = 0.25λ. In Figs. 2 and 3 we present simulation results for the contribution to the mean differential reflection coefficients from light of wavelength (in vacuum) λ = 457.9 nm that was scattered incoherently from a rough silver surface of size 25λ × 25λ, discretized into 319 × 319 points. The dielectric function of silver at this wavelength is ε 2 = −7.5 + 0.24i, and the angles of incidence were θ 0 = 18.24 • and φ 0 = 45 • . Figure 2 shows the in-plane scattering for this system. The enhanced backscattering peak, a multiple scattering phenomenon, is clearly visible, and is as ex- pected strongest in p → p scattering, since p-polarized light has a stronger coupling to surface plasmon polaritons [19]. Figure 3 shows the full angular distribution of the mean DRC for the same system. In Figs.

3(a)-(c) and Figs. 3(d)-(e) the incident light was p-and s-polarized, respectively. Figures 3(c) and 3(f) show scattering into s-polarization, Figs. 3(b) and 3(e) show scattering into p-polarization and in Figs. 3(a) and 3(d) the polariza-
tion of the scatted light was not recorded. In particular from Fig. 3(b), we observe that the enhancement features seen in Fig. 2 at angular position θ s = −θ 0 , are indeed enhancements in a well-defined direction corresponding to that of retro-reflection, and not some intensity ridge structure about this direction (as has been seen for other scattering systems [45]). Moreover, the structures of the angular distribution of the intensity of the scattered light depicted in Fig. 3 are consistent with what was found by recent studies by using other numerical methods [45,46]. The results presented in Figs A test of energy conservation was performed by simulating the scattering of light from a non-absorbing silver surface (Im ε 2 = 0) with otherwise the same parameters as those used to obtain the results of Figs. 2 and 3. For this scattering system we found |U − 1| ≤ 0.0003, i.e., energy is conserved to within 0.03%, something that testifies to the accuracy of the approach and a satisfactory discretization.
As a further test, we studied the scattering from a set of (absorbing) silver surfaces with the same parameters used to obtain Figs. 2 and 3, except that the rms roughness δ was varied between 0 and 0.045λ, while the correlation lengths were held constant at a 1 = a 2 = 0.25λ ≡ a. For the purpose of comparison, we also performed simulations for a similar set of surfaces but assuming no absorption, i.e., we used ε 2 = −7.5. The results of these tests are presented in Fig. 4.
The reduced Rayleigh equation is only valid for sur-  Fig. 2, and the surface was randomly rough with a Gaussian power spectrum, correlation length was kept constant at a = a1 = a2 = 0.25λ, while the rms roughness δ was varied from 0.0 to 0.045λ. The Fresnel coefficients have been included for comparison.
faces of small slopes [17]. We have found that at least for the parameters used in obtaining Fig. 4, our code gives good results for an rms roughness to correlation-length ratio δ/a 0.12, as judged by energy conservation. For larger values of δ/a, the results look qualitatively much the same, but the ratio of reflected to incident power starts to become nonphysical (increasing past 1), as seen in Fig. 4. It is noted that decreasing the sampling interval ∆q, with Q unchanged, did not change this conclusion in any significant way, indicating that the observed lack of energy conservation was not caused by poor resolution in discretizing the integral over q .
The next set of calculations we performed was for a dielectric substrate characterized by ε 2 = 2.64. Otherwise, all roughness parameters were the same as for the silver surface used to produce Figs. 2 and 3. The mean differential reflection coefficient for light scattered incoherently by the rough dielectric surface is presented in Fig. 5. By comparing these results to those presented in Fig. 3, we notice that the dielectric reflects less than the silver (the figures show only the incoherent scattering, but the same holds for the coherent part), which is as expected. The ratio of reflected to incident power for these data was U = 0.0467 for p-polarized light at an angle of incidence of θ 0 = 18.24 • . Moreover, from Fig. 5 we also notice the absence of the enhanced backscattering peak, which is also to be expected since this phenomenon (for a weakly rough surface) requires the excitation of surface guided modes [19]. Note that for a transparent substrate, it is not possible to verify the conservation of energy without also calculating the transmitted field. Therefore, energy conservation has not been tested for the dielectric sub- strate geometry. So far, we have exclusively considered surfaces with statistically isotropic roughness. For the results presented in Fig. 6, we simulated the light scattering from a silver surface of the same parameters as those assumed in producing the results of Figs. 2 and 3, except that now the surface power spectrum was anisotropic, with correlation lengths a 1 = 0.25λ in the x 1 direction and a 2 = 0.75λ in the x 2 direction and an rms roughness of δ = 0.025λ. Figure 6 shows the incoherent part of the mean DRC averaged over 6,800 surface realizations. In this case, there is more diffuse scattering along the x 1 direction than the x 2 direction, which is to be expected, since a shorter correlation length means the height of the surface changes more rapidly when moving along the surface in this direction. The interested reader is encouraged to consult Ref. [33] for a more detailed study of light scattering from anisotropic surfaces.
Finally, for the results presented in Fig. 7, we have simulated the scattering of light from a surface of size 25λ × 25λ, discretized into 319 × 319 points, with ε 2 = −16 + 1.088i, corresponding to silver at a wavelength λ = 632.8 nm. The surface power spectrum was cylindrical [see Eq. (4)], with k − = 0.82ω/c, k + = 1.97ω/c and rms roughness δ = 0.025λ, and the angles of incidence were (θ 0 , φ 0 ) = (1.6 • , 45 • ). Figure 7 shows the in-plane, incoherent part of the mean differential reflection coefficient averaged over 7,000 surface realizations. From perturbation theory [17,19], we know that for an incident wave of lateral wave vector k to be scattered via single scattering into a reflected wave of lateral wave vector q , we must have g(q − k ) > 0, where g(k ) is the surface power spectrum [Eq. (2)]. Since the power spectrum in this case is zero for |q − k | < 0.82ω/c, we have no contribution from single scattering in the angular interval from θ s = −53.5 • to θ s = 56.7 • (for the angles of incidence assumed here). The enhanced backscattering peak, which is due to multiple scattering processes, is clearly visible in Fig. 7 (at θ s = −θ 0 ) partly because it is not masked by a strong single scattering contribution.

VI. DISCUSSION
A challenge faced when performing a direct numerical solution of the reduced Rayleigh equation for the scattering of light from two-dimensional rough surfaces is the numerical complexity. In this section, we discuss some of these issues in detail.

A. Memory Requirements
Part of the challenge of a purely numerical solution of the reduced Rayleigh equation by the formalism introduced by this study, is that it requires a relatively large amount of memory. With approximately N = (π/4)N 2 q possible values for q , the coefficient matrix of the linear equation system will contain approximately (2N ) 2 elements, where the factor 2 comes from the two outgoing polarizations. Hence, the memory required to hold the left hand side of the equation system will be approximately 4N 2 η, where η is the number of bytes used to store one complex number.
If each element is a single precision complex number, which is what was used to obtain the results presented in this paper, then η = 8 bytes, and the matrix will require approximately 2π 2 N 4 q bytes of memory for storage. For instance, with the choice N x = 319, which was used in all the simulations presented in this paper, and that corresponds to N q = 160 [Eq. (23)], the coefficient matrix will take up approximately 12 GB of memory.
Note that if we instead performed the q integration in Eq. (14a) over a square domain of edge Q, the number of elements in the resulting coefficient matrix would be (2N 2 q ) 2 = (16/π 2 )(2N ) 2 . Hence, by restricting the q integration present in the reduced Rayleigh equation to a circular domain of radius Q/2, the memory footprint of the simulation is approximately π 2 /16 ≈ 0.62 of what it would have been if a square integration domain of edge Q was used. For this reason, a circular integration domain has been used in obtaining the results presented in this paper. However, we have checked and found that using a square q integration domain of a similar size will not affect the results in any noticeable way. Indeed, if this was not the case, it would be a sign that Q was too small. When determining the system size, we can freely choose the length of the edge of the square surface, L, and the number of sampling points along each direction, N x . These parameters will then fix the resolution of the surface, ∆x, the resolution in wave vector space, ∆q, the number of resolved wave vectors, N q , and the cutoff in the q integral, Q [see Eqs. (19), (24), (23), and (25)]. The combination of ∆q and Q then determines the number of resolved wave vectors that actually fall inside the propagating region, |q | < ω/c, which is identical to the number of data points used to produce, e.g., Fig. 3.
As we are not free to choose all the parameters of the system, it is clear that some kind of compromise is necessary. The number of sampling points of the surface along each direction, N x , and how it determines N q via Eq. (23), determines the amount of memory needed to hold the coefficient matrix, as well as the time required to solve the corresponding linear set of equations. Hence, the parameter N x is likely limited by practical considerations, typically by available computer hardware or time. For a given value of N x , it is possible to choose the edge of the square surface, L, to get good surface resolution, at the cost of poor resolution in wave vector space, or vice versa. Note also that changing L will change Q via ∆q [see Eqs. (24) and (25)]. If Q is not large enough to include evanescent surface modes, like surface plasmon polaritons, multiple scattering will not be correctly included in the simulations, and the results can therefore not be trusted. The optimal compromise between values of N x and L depends on the system under study.

B. Calculation Time
The simulations presented in this paper were performed on shared-memory machines with 24 GB of mem-ory and two six-core 2.4 GHz AMD Opteron processors, running version 2.6.18 of the Linux operating system. The code was parallelized using the MPI library, and the setup of the linear set of equations ran on all 12 cores in the timing examples given. The linear equation solver used was a parallel, dense solver based on LUdecomposition [54] (PCGESV from ScaLAPACK), which runs efficiently on all 12 cores. Setting up the equation system scaled almost perfectly to a large number of cores, while the solver scaled less well, due to the need for communication. Numerically solving the reduced Rayleigh equation for the scattering amplitudes associated with one realization of a rough surface, discretized onto a grid of 319 × 319 points, took approximately 17 minutes on the architecture described above, and required about 12 GB of memory. Out of this time, approximately 100 seconds was spent setting up the equation system, 950 seconds was spent solving it by LU decomposition, and typically around 1 second was spent on other tasks, including writing data to disk. Table I shows timing and memory details of the calculations, including other system sizes.
Based on the discussion in Sec. VI A, we note that the use of a circular q integration domain also significantly reduces the time required to solve the resulting linear system of equations. When using a dense solver, the time to solve the systems scales as the cube of the number of unknowns. Thus we expect the CPU time to solve the matrix system for a circular integration domain of radius Q/2 to be about half (π 3 /2 6 ) the time to solve the corresponding system using a square domain of edge Q.
The ratio of the time spent solving one equation system to the total simulation time per surface realization increases with increasing system size, as the time to set up the equation system is O (N 4 x ), while the time to solve the linear system by LU decomposition scales as O(N 6 x ). It is clear from Table I that for any surface of useful size the runtime is completely dominated by the time spent in solving the linear set of equations.
Since the time solving the equation system dominates the overall simulation time, we investigated if one could improve the performance of the simulations by using an iterative solver instead of a direct solver based on LU decomposition. For example, Simonsen et al. [57] recently reported good performance using BiCGStab [58] on a dense matrix system of a similar size. In our preliminary investigations into using iterative solvers, we found that convergence with BiCGStab was slow and unreliable for our linear equation systems. However, it should be stressed that we did not use a preconditioning scheme, which could potentially yield significantly improved convergence.
From Eq. (14a) it follows that changing the angles of incidence and/or the polarization of the incident light changes only the right hand side of the equation system to be solved. Hence, an advantage of using LU decomposition (over iterative solvers) is that the additional time required to solve the system for several right hand sides is negligible, since the overall majority of time is spent factorizing the matrix. Conversely, the time spent using an iterative solver (like BiCGStab) will scale linearly with the number of right hand sides. For these reasons, we have chosen to use an LU-based solver.

C. GPU implementation
Currently, performing simulations like those presented in this paper on a single desktop computer is prohibitively time consuming due to inadequate floating point performance. However, the increasing availability of powerful graphics processing units (GPUs) has the potential to provide computing power comparable to that of a powerful parallel machine, but at a fraction of the cost. As the most time-consuming step in our simulations is the LU decomposition of the system matrix (see Table I), this is where efforts should be made to optimize the code. With this in mind, the simulation code was adapted to (optionally) employ version 1.0 of the MAGMA library [59] for GPU-based LU decomposition. Performance was compared between a regular supercomputing service and a GPGPU (General Purpose GPU) testbed. On the regular service, the code was running on a single compute node containing two AMD 2.3 GHz 16-core processors and 32 GB of main memory. On the GPGPU testbed, the hardware consisted of a single Nvidia Fermi C2050 processor with 3 GB of dedicated memory and 32 GB of main system memory. For these two computer systems, the initial performance tests indicated that the LU decomposition took comparable time on the two architectures for a system of size N q = 100 (the difference was less than 10%). The time using the GPGPU testbed included the transfer of the system matrix to and from the Fermi card and the decomposition of the matrix. Even though these results are preliminary, it demonstrates that there is a possibility of perform-ing simulations like those reported in this study without having to resort to costly supercomputing resources. Instead, even with limited financial means, they may be performed on single desktop computers with a state-ofthe-art GPU.

VII. CONCLUSION
We have introduced a formalism for performing nonperturbative, purely numerical, solutions of the reduced Rayleigh equation for the reflection of light from twodimensional penetrable rough surfaces, characterized by a complex dielectric function ε(ω).
As an example, we have used this formalism to carry out simulations of the scattering of p-or s-polarized light from two-dimensional randomly rough dielectric and metallic surfaces characterized by isotropic or anisotropic Gaussian and cylindrical power spectra. From the scattering amplitudes, obtained by solving the reduced Rayleigh equation, we calculate the mean differential reflection coefficients, and we calculate the full angular distribution of the scattered light, with polarization information. For the scattering of light from weakly rough metal surfaces, the mean differential reflection coefficient shows a well-defined peak in the retro-reflection direction (the enhanced backscattering phenomenon). From previous experimental and theoretical work, this is to be expected for such scattering systems. Moreover, the obtained angular distributions of the intensity of the scattered light show the symmetry properties found for strongly rough surfaces in recent studies using other simulation methods.
For the purpose of evaluating the accuracy of our simulation results, we used the conservation of energy for a corresponding non-absorbing scattering system. This is a required, but not sufficient, condition for the correctness of the numerical simulations. By this method, we found that within the validity of the reduced Rayleigh equation our code produces reliable results, at least for the parameters assumed in this study. In particular, for a rough non-absorbing metal surface of the parameters used in this study, energy was conserved to within 0.03%, or better. This testifies to the accuracy of the approach and a satisfactory discretization. Moreover, we also performed simulations of the scattered intensity for systems where the rms roughness of the surface was systematically increased from zero with the other parameters kept unchanged. It was found that energy conservation was well satisfied (for the parameters assumed) when the ratio of rms roughness (δ) to correlation length (a), satisfied δ/a 0.12.
We believe that the results of this paper provide an important addition to the collection of available methods for the numerical simulation of the scattering of light from rough surfaces. The developed approach can be applied to a wide range of scattering systems, including clean and multilayered scattering systems, that are relevant for nu-merous applications.

ACKNOWLEDGMENTS
We would like to acknowledge the help of Dr. Chris Johnson at the EPCC, University of Edinburgh, for help in parallelizing the code. We are also indebted to Dr. A. A. Maradudin for discussions on the topic of this paper. The work of T.N. and P.A.L. was partially carried out under the HPC-EUROPA2 project (project number: 228398) with support of the European Commission -Capacities Area -Research Infrastructures. The work was also supported by NTNU by the allocation of computer time.