Anomalous diffusion: A basic mechanism for the evolution of inhomogeneous systems

In this article we review classical and recent results in anomalous diffusion and provide mechanisms useful for the study of the fundamentals of certain processes, mainly in condensed matter physics, chemistry and biology. Emphasis will be given to some methods applied in the analysis and characterization of diffusive regimes through the memory function, the mixing condition (or irreversibility), and ergodicity. Those methods can be used in the study of small-scale systems, ranging in size from single-molecule to particle clusters and including among others polymers, proteins, ion channels and biological cells, whose diffusive properties have received much attention lately.


A. General concepts
Diffusion is a basic transport process involved in the evolution of many non-equilibrium systems towards equilibrium [1][2][3][4][5][6][7][8][9][10][11][12][13]. By diffusion, particles (or molecules) spread from regions of high concentration to those of low concentration leading, via a gradual mixing, to a situation in which they become evenly dispersed. Diffusion is of fundamental importance in many disciplines; for example, in growth phenomena, the Edwards-Wilkinson equation is given by a diffusion equation plus a noise [14]. In cell biology it constitutes a main form of transport for amino acids and other nutrients within cells [15].
For more than two hundred years [1], diffusion has been a widely studied phenomena in natural science due to it's large number of applications. Initially, the experiments carried out by Robert Brown [16,17] called the attention to the random trajectories of small particles of polen and also of inorganic matter. This irregular motion, later named Brownian motion, can be modeled by a random walk in which the mean square displacement is given by Einstein's relation [1] (∆r) 2 = 2dDt, where ∆r is the displacement of the Brownian particle in a given time interval t, d is the spatial dimension, and * fao@fis.unb.br D the diffusion coefficient. Whereas a single Brownian particle trajectory is chaotic, averaging over many trajectories reveals a regular behavior.
The purpose of this article is to review recent efforts aiming at formulating a theory for anomalous diffusion processes-i.e., those where the mean square displacement does not follow Equation (1) which can be applied to many different situations from theoretical physics to biology.
In the next section, we introduce the pioneering works on diffusion, and then call attention to the existence of anomalous diffusion. We discuss the main methods to treat anomalous diffusion and concentrate our efforts on the discussion of the generalized Langevin formalism.

B. The tale of the three giants
At the birth of the gravitational theory Isaac Newton mentioned that he build up his theory based on the previous works of giants. Also, one cannot talk about diffusion without underscoring the works of Albert Einstein, Marian Smoluchowski, and Paul Langevin. At the dawn of the last century, the atomic theory was not widely accepted by the physical community. Einstein believed that the motion observed by Brown was due to the collisions of molecules such as proposed by Boltzmann in his famous equation. However, Boltzmann's equation was difficult to solve and therefore he proposed a simpler analysis combining the kinetic theory of molecules with the Fick's law, arXiv:1902.03157v1 [cond-mat.stat-mech] 8 Feb 2019 from which he obtained the diffusion equation where ρ(x, t) is the density of particles at position x and time t. The solution of the above equation yields ρ(x, t) and lim t→∞ x 2 (t) = x 2 (t)ρ(x, t)dx = 2Dt, where x(t) = 0, assuming the symmetry ρ(−x, t) = ρ(x, t). For simplicity, we have considered the motion one-dimensional and set all particles at the origin at time t = 0. Generalization to two and three dimensions is straightforward. He then considered the molecules as single non-interacting spherical particles with radius a and mass m, and subject to a friction γ when moving in the liquid [1, 18,19] . Finally, he deduced the famous Einstein-Stokes relation [19] for the diffusion constant where N a is Avogadro's number, fundamental for atomic theory, but unknown at the time, R is the gas constant, µ the mobility, η the viscosity, and T the temperature. The scientific community became very excited and, in the years following Einstein's papers, some dedicated experiments helped to verify Equation (3). Diffusion constants were measured and it became possible to estimate Avogadro's number from different experiments. Moreover, it was possible to estimate the radius of molecules with few hundreds of atoms. Einstein was successful in demonstrating that atoms and molecules were not mere illusions as critics used to suggest. Finally, the theory of Brownian motion started to set a firm ground. For instance, it became possible to associate diffusion with conductivity in the case of a gas of charged particles. Supposing each has the same charge e and is subjected to a time dependent electric field E = E(ω) exp(−iωt), the conductivityσ(ω) can be defined by J(ω) =σ(ω) E(ω), where J(ω) is the current. Now, it was possible to relate the diffusivityD(w) withσ(ω) by the relation [1, 20,21] where is n the carrier density. From that it was obvious that connections between a diversity of response functions could be obtained. Two major achievements in the theory of stochastic motion were due to Smoluchowski. As expressed by Novak et al. [22], "One was the Smoluchowski equation describing the motion of a diffusive particle in an external force field, known in the Western literature as the Fokker-Planck equation [23,24]" where and Here P (v , δt, v) is the transition probability between two states with different velocities. The second one, a fundamental cornerstone of molecular physical chemistry and of cellular biochemistry [22,25] is "Smoluchowski's theory of diffusion limited coagulation of two colloidal particles." Unfortunately, due to his premature death, the Nobel prize was not awarded to Smoluchowski. However, the scientific community pays tribute to him [26]. The last of the three giants was Langevin, who considered Newton's second law of motion for a particle as [27] dividing the environment's (thermal bath) influence into two parts: a slow dissipative force, −mγv, with time scale τ = γ −1 , and a fast random force f (t), which changes in a time scale ∆t τ , subject to the conditions and If we solve Equation (9) and, using the equipartition theorem, impose v 2 (t → ∞) = v 2 eq = k B T /m, where k B = R/N a is the Boltzmann constant, we obtain Λ = 2mγk B T and write This last equation establishes a relation between the fluctuation and the dissipation in the system reconnecting the useful, although artificial, separation of the two forces. This relation has been named the fluctuationdissipation theorem (FDT) and is one the most important theorems of statistical physics. Equations (10), (11), and (13) yield the velocity-velocity correlation function that reads [28] C the mean square displacement (15) and known as the Kubo formula. This bring us back to Einstein's results for diffusion, Equation (4). The simplification introduced by the Langevin formalism makes it easy to carry out analytical calculations and computer simulations. Consequently, the Langevin equation, and its generalization (Sec. 3), has been applied successfully to the study of many different systems such as chain dynamics [29][30][31][32][33][34], liquids [35,36], ratchets [37,38], and synchronization [39,40]. However, its major importance was to relate fluctuation with dissipation.
The Fokker Planck equation (6) with the Langevin choices becomes Inspection of different supposedly diffusive processes such as enhanced diffusion in the intracellular medium [42], cell migration in monolayers [43], Levy flight search on a polymeric DNA [44], or the Brownian motion in an inhomogeneous medium [45] reveals that the previous framework is not always fulfilled. Instead, in these and other similar examples the mean square displacement deviates from the linear temporal evolution. One of the most common anomalous behaviours is given by where α = 1 is a real positive number [4][5][6]46]. The origin of this discrepancy is the tacit assumption made in the derivation of (1) that the Brownian particle moves in an infinite structureless medium acting as a heat bath. This assumption is generally incorrect when the Brownian motion takes place in a complex medium, as is the case of the previously mentioned examples. We illustrate in Fig. (1) the mean square displacement x 2 (t) as a function of t for three distinct Brownian motions in one dimension. From the upper curve downwards, we have α = 1.5 (superdiffusion), α = 1 (normal diffusion), and α = 0.5 (subdiffusion).
There are many formalisms that describe anomalous diffusion, ranging from thermodynamics [78][79][80], and fractional derivatives [4,6] to generalized Langevin equations (GLE) [81][82][83]. The goal of this short review is to call attention to relevant research within the GLE framework and to some fundamental theorems in statistical mechanics.

A. Non-Markovian processes
The Langevin formalism within its classical description has some restrictions: 1. It has a relaxation time τ = γ −1 , and an unspecified ∆t τ , while a diffusive process in a real system can present several time scales; 2. For short times t < ∆t, predictions are unrealistic; for example, the derivative of C v (t) is zero [84] at t = 0, while in Langevin's formalism C v (t) = exp(−γ|t|), exhibiting a discontinuity.
3. If no external field is applied it cannot predict anomalous diffusion.
Up to now we have used time and ensemble averages without distinction, i.e. we implicitly used the Boltzmann Ergodic hypothesis (EH). The ensemble average for a variable B(t) is defined by where Ω(E, B) is the number of states for a given energy E. For the time average we have For times τ 0 τ , the Ergodic Hypothesis (EH) reads which means that the system should be able to reach every accessible state in configuration space given enough time. This is expected to be true for equilibrated macroscopic systems and also for systems that suffer small perturbations close to equilibrium. In this section, we generalize Langevin's equation and study some of its consequence for diffusion. The first correction to the FDT was done by Nyquist [85] who formulated a quantum version of the FDT. Later on Mori [86] and Kubo [87] used a projection operator method to obtain the equation of motion, the Generalized Langevin equation (GLE), for a dynamical operator A(t), where Π(t) is a non-Markovian memory and F (t) is a random variable subject to 1. F (t) = 0, 2. F (t)A(0) = 0, and, 3. the Kubo fluctuation-dissipation theorem (FDT) [88,89] In this way, it is clear that time translational invariance holds in the Kubo formalism. An alternative to the projection operators, the recurrence relation method, was derived by Lee [84,[90][91][92].
It is easy to show that Equation (22) can give rise both to normal and to anomalous diffusion [1]. Let us consider two limiting case examples: first, when Π(t) = 2γδ(t), we recover the normal Langevin equation (9) with normal diffusion (α = 1); second, for a constant memory where Equation (24) is the equation of motion for a harmonic oscillator with zero diffusion constant (D = 0) and with exponent α = 0. Consequently, we may have different classes of diffusion for distinct memories. Now, we can study the asymptotic behavior of Equation (25) or of its second moment, Equation (18), to characterize the type of diffusion presented by the system for any memory Π(t). Much information about the system's relaxation properties can be obtained by studying the correlation function The existence of stationary states warrants timetranslation invariance so that the the two-time correlation function becomes a function only of the difference between two times, such as in the Kubo FDT above. We shall return to this point later. The main equation we are interested in is Equation (1), where for anomalous diffusion, D should be replaced by D(t) now defined by From here onwards, the tilde over the function stands for the Laplace transform. For the last equality, we use the final-value theorem for Laplace transforms [93], which states that if f (t) is bounded on (0, ∞) and lim t→∞ f (t) has a finite limit, then lim t→∞ f (t) = lim z→0 z f (z). Also note that the Laplace transform of the integral of a function is the Laplace transform of the function divided by z, then D(z) = C A (z)/z. Using z ∝ 1/t, we obtain the asymptotic behavior of D(t). In order to do that, we multiply Eq. ( 22) by A(0), take the average and use the conditions (1) and (2) above for the noise to obtain the self-consistent equation where for simplicity we have defined Note that from Equation (22), we need to average from a large number of stochastic trajectories, while to obtain Eq. (29), it is only necessary to solve a single equation, i.e. Eq. (28). Further insight can be gained by analyzing the Laplace transformed version of Equation (28) From here, it is clear that the knowledge of Π(z) in the limit z → 0 completely defines the asymptotic dynamics.
and consequently [5] where We see that we have a cutoff limit for the exponent β and, therefore, also for α.
Due to the existence of correlations in the GLE that arise through hydrodynamical interactions [28], it has been proposed [5,46,94] to establish a connection between the random force F (t) and the noise density of states ρ(ω) of the surrounding media, modeled as a thermal bath of harmonic oscillators [5,81] of the form where 0 < φ < π are random phases. Now, using the Kubo FDT, eq. (23), and time averaging over the cosines, we obtain the memory as [5,8] which is an even function independent of the noise distribution. Here, ρ(ω) = C 2 (ω) A 2 eq /2. The consequences of considering a colored noise given by a generalization of the Debye spectrum with ω s as a Debye cutoff frequency were analyzed in detail in [95]. The reason for the choice of this functional form for the noise density of states is that it was previously shown in [5,8] that if Π(z) ∝ z µ as z → 0, then the same restriction as in Equation (33), with ν = µ, applies and the diffusion exponent [5] is given by Equation (32). Later, this problem was revisited by Ferreira et al. [96] in which a generalized version of Equation (18) was considered, namely Most authors [4][5][6]46] have reported the cases of anomalous diffusion where, n = 0 and α = 1. However, some authors such as for example, Srokowksi [97,98] reports situations were for t → ∞, the dispersion behaves as i.e., a weak subdiffusive behavior for which we can say that α = 1 − . In this way Ferreira et al. [96] generalizes the concept of α, to associate with Eq. (37), the α ± exponents, which arise analogously to the critical exponents of a phase transition [99][100][101]. For example, in magnetic systems with temperatures T close to the transition temperature T c , the specific heat at zero field, H = 0, exhibits the power law behavior C H=0 ∝ |T − T c | −α , where α is the critical exponent. However, for the twodimensional Ising model [99] the critical exponent can be considered α = 0 + , since the specific heat behaves logarithmically, C H=0 ∝ ln |T − T c |, instead. Logarithmic corrections [101] to scaling have also been applied to the diluted Ising model in two dimensions in [102]. This generalized nomenclature is pertinent because there are many possible combinations of both logarithmic and power law behaviors. This result highlights the existence of different types of diffusion.
In this way, for the density of states (36), the generalized α becomes In Equation (36), we choose the constant such that for normal diffusion Γ(z = 0) = γ. The exponent for Ballistic diffusion (BD), α = 2, is the maximum for diffusion in the absence of an external field. The slow ballistic motion α = 2 − has properties that differ markedly from the ballistic case, see sections (3.3 and 3.4).

B. Non-exponential relaxation
Besides the importance of the asymptotic behavior, the study of the correlation R(t) for finite times is also obviously significant, and there exists a vast literature describing non-exponential behavior of correlation functions in systems ranging from plasmas to hydrated proteins [103][104][105][106][107][108][109][110][111], since the pioneering works of Rudolph Kohlrausch [112] who described charge relaxation in Leyden jars using stretched exponentials, R(t) ≈ exp [−(t/τ ) β ] with 0 < β < 1, and his son Friedrich Kohlrausch [113], who observed two universal behaviors: the stretched exponential and the power law. Since many features are shared among such systems and those that present anomalous diffusion [94,114], it is natural that similar methods of analysis can be applied to both. For example, from Equation (28), dR(t) dt must be zero at t = 0, which is at odds with the result R(t) = exp(−γ|t|) of the memoryless Langevin equation. Nevertheless, we know that the exponential can be a reasonable approximation in some cases: Vainstein et al. [95] have presented a large diversity of correlation functions that can be obtained from Equation (28) once Π(t) is known. Since, from Equation (35), Π(t) is an even function then we can write From Equation (28), they proved that R(t) must also be an even function, therefore with a 0 = R(0) = 1. We insert Equations (40) and (41) into Equation (28) to obtain the recurrence relation [95] which shows the richness and complexity of behavior that can arise from a non-Markovian model. The above defined convergent power series represents a large class of functions, including the Mittag-Leffler function [115] which behaves as a stretched exponential for short times and as an inverse power law in the long time scale. Note that even for the simplest case of normal diffusion, ν = 0, R(t) is not an exponential since at the origin its derivative is zero; however, for broad-band noise ω s γ, i.e., in the limit of white noise it approaches the exponential R(t) = exp(−γt), for times larger than τ s = ω −1 s . In Fig. (2), from [95], we display the rich behavior of the correlation function R(t) for the case of normal diffusion. Here, γ = 1 and ω s = 2 and 20, for curves a and b, respectively. Curve c is the plot of exp(−γt). In the inset, we highlight that although curve b and the exponential approach one another for long times, for short times they differ appreciably. Also plotted is cos(ω 0 t), with ω 0 = Π(0).

The decay towards an equilibrium state
One of the most important aspects of dynamics is to observe the asymptotic behavior of a system or how it approaches equilibrium (or not). Observe that a direct solution for Equation (22) is Given the initial states A(0), it is possible to average over many trajectories to obtain the temporal evolution of the moments A n (t), with n = 1, 2, . . .. The first moment arises directly from an average of Equation (43), Taking the square of Equation (43) and averaging, we obtain The skewness is defined as a measure of the degree of asymmetry of the distribution of A(t), and is given by [83] where We also obtain the non-Gaussian factor [83] Consequently, we see that R(t) completely determines these averages. One can note that if the system is originally at equilibrium A(0) = 0 and A 2 (0) = A 2 eq , then the system remains in equilibrium. If the system is not in equilibrium, then drives the system towards equilibrium. The condition stated in Equation (48) is called the mixing condition (MC) and is a fundamental concept in statistical mechanics, which asserts that after a long time the system reaches equilibrium and forgets all initial conditions. Note that parity is conserved as well: if the initial skewness is null, ζ(0) = 0 in Eq. (46), it will remain null during the whole evolution; the same holds for the nongaussian factor.

The Kinchin theorem and ergodicity
The Khinchin theorem (KT) [9,116] states that if the the MC holds, then ergodicity holds. As shown below, anomalous diffusion is a good field for testing ergodicity breaking [9,10,83]. For a situation where the MC is violated, we have where κ is the nonergodic factor [8,9,38,117]. For example for Π(z → 0) ∝ bz , α = 2, and I.e., the MC is violated in the ballistic motion, and consequently ergodicity is violated, but not the KT. Note that for α = 2 − , where R(t → ∞) → 1/ ln(t) → 0, the MC is not violated. In this way the MC is satisfied for all diffusive processes in the range 0 < α < 2 − [96]. It is interesting to observe that Equation (49) for long time behavior is equivalent to the condition for systems with long range interactions [118,119], where U ( r) is the potential between the particles and the integration is performed over all space.

Gaussianization
As an illustration of the analytical results, we numerically integrated the GLE, Equation (22), to approximate the particle velocity distribution function, using Equations (35) and (36) to generate the memory for ν = −0.5, 0, and 0.5, which by Equation (39) give α = 0.5, 1, and 1.5, respectively. The results are exhibited in Fig. (3), from [9], where we show the probability distribution functions as function of the momentum p. From left to right, we have subdiffusion (ν = −0.5), normal diffusion (ν = 0), and superdiffusion (ν = 0.5) for the values a = 2γ π = 0.25. A value ω s = 0.5 was used for normal and superdiffusion. In the case of subdiffusion, a broader noise ω s = 2 is needed for it to arrive at the stationary state. It is expected that R(t → ∞) = 0 in all cases, and that the EH will be valid even for the subdiffusion (superdiffusion). It should be noted that despite large fluctuations in the time average, there is a good agreement between the ensemble and time distributions, in agreement with Eqs. (44) to (48), indicating the validity of the EH. In all cases, the distributions converge to the expected Maxwell-Boltzmann distribution, in accordance with analytical results [83].
In Fig. (4 we show the evolution of the nongaussian factor. We see that for the Ballistic diffusion (BD), it does not reach a null value, but it evolves towards it. Note that even for a situation where κ = 0, the nongaussian factor will be very small and the probability of it being non-zero in simulations after a long time is very small. For example for κ = 0.1, there will a factor of 10 −4 in relation (47).
In Fig. (5), from [83], we show the evolution of the kinetic temperature of the system by taking A 2 (t), with A = P , the momentum of the particles. In this case we should have A 2 eq − A 2 eq ∝ T [83] and the temperature evolution can be obtained from Equation (45). We consider both normal and ballistic diffusion (BD) in a reservoir characterized by T = 1.0 with initial high and low temperatures T 0 = 1.5 and T 0 = 0.5, respectively. For normal diffusion (dashed curve), we have R(t) = exp (−γt), with γ = 10 −3 . For BD, R(t) is calculated numerically in [83] with γ = 1. Since BD's relaxation is slow, we take a "friction" γ a thousand times larger than that of normal diffusion for comparison. As expected, in the case of normal diffusion the system's temperature always relaxes to that of the reservoir, while for BD the temperature approaches that of the reservoir without reaching it [83,114]. Figure (4) displays the normalized non-Gaussian factor, Equation (47), as a function of time [83] for the cases in the previous figure, with the same convention for the labeling. For normal diffusion, the system's probability distribution evolves towards a Gaussian, which is not the case for BD. In the latter case, R(t) oscillates around the value predicted by Equation (50), even for long times, In both cases, the initial probability distribution function was the Laplace distribution, with A 2 (0) = 1 and A(0) = 0. (48) to (21). Lapas et al. [9,83] have shown that the KT is valid for all forms of diffusion, and that the ballistic diffusion violates the EH, but not the KT.

Figures (3)-(4) are just illustrations of the results that can be obtained analytically from Equations
Although it is expected that a system in contact with a heat reservoir will be driven to equilibrium by fluctuations, Figs. (5) and (4) show that it is not always the case and in some far from equilibrium situation it may not happen. The concept of "far from equilibrium" is itself sometimes misleading, since it depends not only on the initial conditions, but also on the possible trajectories the system may follow [8,103,104]. It was a known fact that the FDT can be violated in many slow relaxation processes [1, 120, 121], however, it was a surprise that it could occur in a GLE without disorder [122] or an external field [1, 8,20]. Finally, Costa et al. [8] have called attention to the fact that if after a long time the fluctuations are not enough to drive the system to equilibrium then the fluctuation-dissipation theorem is violated. For the dissipation relation to be fulfilled, the MC condition, Equation (48), must be valid.

IV. BEYOND THE BASICS... AND MORE BASIC
In the last sections we have discussed some basic results for anomalous diffusion under the point of view of the formalism of the generalized Langevin equation which yield two main features: simplicity and exact results. Obviously, this approach does not exhaust the subject, since diffusion is a basic phenomenon in physics it is a starting point to many different formalisms which we shall briefly discuss.

A. Fractional Fokker-Planck equation
We have seen that, in principle, all kinds of anomalous diffusion can be described by the GLE formalism, which is itself well established from the Mori method. Since normal diffusion can be studied both from Langevin equations and from Fokker-Planck equations, we would expect to obtain a generalized Fokker-Planck formalism for anomalous diffusion. Indeed, fractal formulations of the Fokker-Planck equation (FFPE) have been widely used in the literature in the last decades, in which the evolution of the probability distribution function P (x, t) reads [3, 4, 6] In both cases, we consider the initial PDF as a Laplace distribution with unit mean and unit variance. The functions R(t) are as in Fig. (5). From [83], with permission from Europhysics Letters.
where on the left-hand-side a fractional Rieman-Liouville time-derivative is defined as [123] with 0 < α < 1. Note that the definition of fractional derivatives is not unique [123,124], with a variety of possibilities, physically (almost) totally unexplored. We notice here that the nonlocal character of the fractional Fokker-Planck equation is similar to the memory kernel in the GLE. On the right hand side, γ α is a generalized friction, U (x) is the potential, while D α is generalized Einstein-Stokes relation The major result from Equation (52) for a force-free diffusion is the asymptotic solution For BD we use the noise distribution ρ(ω) = 2γ/π for 1 < ω < 4 and ρ(ω) = 0, otherwise, and we found R(t) numerically (solid curve). Since BD converges very slowly, we use γ = 1 for BD and γ = 10 −3 for normal diffusion with R(t) = exp(−γt), so we can compare both. We choose T0 = 1.5 and T0 = 0.5 for the initial temperatures. In both cases we used T = 1 for the reservoir temperature. From [83], with permission from Europhysics Letters.

B. Interface growth
Models for interface growth generally consider the random deposition of particles that diffuse to a surface and, as such, have been studied with Langevin equations and modified diffusion equations. Since diffusion with subsequent deposition is ubiquitous, rough surfaces at the interface of two media are very common in nature [14,[126][127][128][129][130] and the description of interface evolution is a very interesting problem in statistical physics. The major objective here is to describe the temporal evolution of the height h( x, t) of the interface between two substrates [14,126], where x is the d dimensional position vector and t is the time. We outline the evolution of h in Fig. (6). In (a), we show a real forest fire propagation, a very complex situation. However, we can focus on the interface between the burnt and unburned regions. In (b), we provide a snapshot at a fixed time t of the height h( x, t) for a microscopic growth process in which the green medium is penetrating the blue one with arbitrary units. These types of dynamics in d + 1 dimensional space are easy to understand, but not so simple to solve analytically.
Experiments can be done for d = 1, 2, 3; however, they present hard theoretical problems for any d.
The two main quantities of interest are the average height where V is the sample volume, and the standard deviation often called the surface width w(t), or roughness. Not surprisingly, this height fluctuation has a lot of information about the physical processes governing the system. The evolution of w(t) observed through experiments, computer simulations, and a few analytical results gives us some general features of growth dynamics. Starting with a flat surface, w(0) = 0, the evolution exhibits four distinct regions: (a) for a very short period 0 < t < t 0 , during which correlations are negligible, the process is a random deposition w(t) ∼ t 1/2 ; (b) for t 0 < t < t × we have w(t) ∼ t β . Here the t × follows a power law of the form t × ∼ L z , where L is the size of the sample, z is the dynamic exponent, and β the growth exponent; (c) there is a transition region for t ∼ t × ; (d) finally, for t t × , the dynamic equilibrium leads to surface width saturation, w s , which also follows a power law w s ∼ L α , where α is the roughness exponent. The crossing of the curves w(t) ∼ t β and w ∼ w s yields the universal relation It should be pointed out that these exponents are not related with those of the previous section.
To obtain w(t), we need to know h( x, t) and there are two major theoretical ways to attack this problem: 1. Continuous growth equations; 2. Discrete growth models. Fig. (6a) is an example of the first, while Fig. (6b) is an example of the second.

Equation of motion and symmetries
The growth, in general, is due to the number of particles per unit of time G( x, t) arriving on the surface at the position x and time t. The particle flux is not uniform since the particles are deposited at random positions [14]. Therefore, the evolution of h( x, t) can be described by where the first term F is the average number of particles arriving at site x. The second term, χ( x, t), reflects the random fluctuations and satisfies where, D g measures the degree of growth randomness. The deterministic flux F must satisfy certain symmetry requirements, such as invariance under translation of position, x → x + x 0 , height, h → h + h 0 and time t → t + t 0 . To satisfy these conditions, it must depend only on derivatives, which, again by the symmetries x → − x and h → −h, must be of even order [14,126]. Besides this, considering that symmetrically possible terms such as ∇ 2n h( x, t), for n = 2, 3, . . . , are irrelevant in the long wavelength limit, since they go to zero faster than ∇ 2 h( x, t), we obtain, at the lowest-order in the derivatives [126], known as Edwards-Wilkinson equation (EW). Note that it is basically a diffusion equation, as (2), plus a noise, where we have a surface tension ν associated with the Laplacian smoothening mechanism. The random deposition model with surface relaxation is in the same universality class as the EW model [131], i.e., they share the same exponents.
Because there are, however, a large class of growth phenomena which are not described by the EW equation, new formulations become necessary. The recently proposed Arcetri models [132] allow the study of more rigid interfaces than the EW model and still allow for exact solutions in any dimension d. Depending on the initial conditions, both growing interfaces and particle motion on the lattice can be modeled. Another classical model was proposed by Kardar et al. [127], inspired by the stochastic Burgers equation. They observed that lateral growth could be added to this equation via the nonlinear term of the Burgers equation so that Equation (62) becomes Since its formulation, the Kardar-Parisi-Zhang equation (KPZ) has been a prime model in the description of growth dynamics. The nonlinear term includes a new constant λ associated with the tilt mechanism and breaks down the symmetry h → −h. Consequently, the universality class of KPZ is different from that of EW. Equation (63) is the simplest nonlinear equation that can describe a large number of growth processes [14,127]. However, the apparent simplicity of this equation is misleading, since the nonlinear gradient term combined with the noise makes it one of the toughest problems in modern mathematical physics [133,134]. On the other hand, its complexity is compensated for its generality. It is connected to a large number of stochastic processes, such as the direct polymer model [135], the weakly asymmetric simple exclusion process [136], the totally asymmetric exclusion process [137], direct d-mer diffusion [138], fire propagation [139][140][141], atomic deposition [142], evolution of bacterial colonies [143,144], turbulent liquid-crystals [145][146][147], polymer deposition in semiconductors [148], and etching [149][150][151][152][153][154][155][156][157][158][159][160].
This problem in non-equilibrium statistical physics is analogous to the Ising model for equilibrium statistical physics, which is used as a basic model for understanding a large class of phenomena. The search for exact solutions to this equation in the d + 1 dimensional space has resulted in important contributions to mathematics; however, to date, they have been obtained only for specific situations and are limited to 1 + 1 dimensions [127,134] .

Scaling invariance
In this small subsection we limit ourselves to scaling symmetries only. However, in interface growth, the shape of single-time and two-time responses and height correlation functions can be derived from the assumption of a local time-dependent scale-invariance [161]. For the KPZ equation, a systematic test of aging scaling was performed, showing the scaling relations for the twotime spatio-temporal autocorrelator and for the timeintegrated response function [151,162,163].
The scaling invariance can be investigated through the transformation [14] x → b x, h → b α and t → t z which yields, for the KPZ equation, (63), and Scaling invariance demands that the exponents in Equations (64,65,66) must be zero. However, a simple inspection shows that they are inconsistent. The renormalization group approach of Kardar el al. [127], uses the nonlinear term as a perturbation, and their result shows that only Equation (66) remains invariant, yielding the famous Galilean invariance Equations (64) and (65) are corrected by the renormalization, and the final result yields α = 1/2, β = 1/3, and z = 3/2, for 1 + 1 dimensions. Since the KPZ renormalization approach is valid only for 1 + 1 dimensions, questions about the validity of the Galilean invariance [164,165] for d > 1 and the existence of an upper critical dimension for KPZ [166,167] have been raised. For d > 1, the numerical simulation of the KPZ equation is not an easy task [164,165,[168][169][170][171], and the use of cellular automata models [149][150][151][152][153][154][155][156][157][158][159][160][172][173][174][175] has become increasingly common for growth simulations. Polynuclear growth (PNG), is a typical example of a discrete model that has received a lot of attention, and the outstanding works of Prähofer and Spohn [176] and Johansson [177] drive the way to the exact solution of the distributions of the heights fluctuations f (h, t) in the KPZ equation for 1 + 1 dimensions [134]. By construction, h → h − h , then f (h, t) has zero mean, so its skewness and kurtosis are the most important quantities to observe [147,[178][179][180][181]. In addition, Langevin equations for growth models have been discussed by some authors [182][183][184]. Several works have been done in the weakly asymmetric simple exclusion process [136], the totally asymmetric exclusion process [137,185], and the direct d-mer diffusion model [138]: for a review see [170,181,186,187]. More experimentally measured exponents for growing interfaces in four universality classes (KPZ, quenched KPZ, EW and Arcetri) can be found in [132] and references therein.
3. Cellular automata growth models Cellular automata are described by simple rules, which allow us to inquire about relevant properties of a complex dynamical system. For a given growth model, the first question to be answered is if the model belongs to the same universality class as KPZ. In this context, we expose here the etching model [149], which has attracted considerable attention in recent years [150][151][152][153][154][155][156][157][158][159][160]. These studies suggest a close relation between the etching model and KPZ. For example, for 1 + 1 dimensions Alves et al. [158] have proven that α = 1/2 exactly. Unfortunately, their method does not allow to obtain β or z. The etching model describes the mechanism of an acid eroding a surface. The detailed description of the model can be found in [149]) For simplicity, let us consider a site i in a hypercube of side L and volume V = L d in space Ψ of dimension d, and look at one of its nearest neighbors, j. If h j > h i , then it becomes equal to h i . We then define the etching model following the steps: 1. At time t we randomly choose a site i ∈ V .
In Fig. (7), we show the mechanism of the etching model in one dimension: (a) step 1, we randomly select a site at time t, here i = 2 shown as green in the figure; step 2, the site i = 2 interacts with its neighbors j = 1, 3. Then in Fig. (72) h 1 (t + ∆t) = h 2 (t), meaning that it is strongly affected, and h 3 (t + ∆t) = h 3 (t), meaning it is not affected; (c) step 3, h 2 (t + ∆t) = h 2 (t) − 1. We describe here a process of strong interactions between the site and its neighbors.
Using these rules and averaging over N e numerical experiments, it is possible to obtain the exponents α, β, and z [149][150][151][152][153][154][155][156][157][158][159][160]. Finally, the upper critical limit for the etching model (KPZ) was recently discussed by Rodrigues et al. [156], where it was shown numerically that there is no upper critical dimension for the etching model up to 6+1 dimensions. Thus, we have established a lower limit for the KPZ critical dimension, i.e., if it exists then d c > 6, in agreement with other authors [138]. They have shown that the Galilean invariance (67) remains valid for d ≤ 6 as well. These problems, however, still lack an exact solution.

C. Reaction-diffusion processes
One could not complete a work on diffusion without a brief discussion of reaction-diffusion processes [185,[188][189][190]. Exactly solvable reaction-diffusion models consist largely of single species reactions in one dimension, e.g., variations of the coalescence process, A + A → A + S [191][192][193] and the annihilation process A + A → S + S [192,193], where A and S denote occupied and empty sites, respectively. These simple reactions display a wide range of behavior characteristic of non-equilibrium kinetics, such as self organization, pattern formation, and kinetic phase transitions. Interval methods have provided many exact solutions for one-dimensional coalescence and annihilation models. The method of empty intervals, applicable to coalescence models, requires solution of an infinite hierarchy of differential difference equations for the probabilities E n of finding n consecutive lattice sites simultaneously empty. For annihilation models, the method of parity intervals similarly requires determination of G n , the probability of n consecutive lattice sites containing an even number of particles [189]. In the continuous limit, such models can be described exactly in terms of the probability E(t, x) of finding an empty interval of size x at time t. Under fairly weak conditions, the equation of motion for E(t, x) is a diffusion equation, albeit with the unusual boundary condition E(t, 0) = 1. In several cases, the exact and fluctuation-dominated behaviour in d = 1 has been seen experimentally, since the 1990s. This is one of the rare cases where theoretical statistical mechanics can be compared with experiments and also shows that simple mean-field schemes are not enough. For an introduction see [188], and for up to date references see [190].
Another important new development in diffusion concerns stochastic resets, as they were introduced by Evans and S.N. Majumdar (see [194][195][196] and references therein). This leads to important modifications of the stationary state and raises the question how to consider the status of the general theorem described in the present review.

V. CONCLUSIONS
In this short review we address diffusion, as a modern and important topic of statistical physics, with broad applications [4-6, 8, 9, 40, 42]. We emphasize the study of systems with memory, for which a generalized Langevin equation applies and describe how the diffusion exponent is obtained from the memory [5,96]. We highlight the properties of response functions [1,83] for the processes of anomalous relaxation [1,114], Gaussianization and ergodicity [9,83]. Moreover, we have established a hierarchy: the mixing condition is stronger than the ergodic hypothesis, which is itself stronger than the fluctuationdissipation theorem. It is important to call attention again to Costa [8], where it was observed that in the ballistic diffusion the fluctuation-dissipation theorem fails. Indeed, for the ballistic motion which, on average, behaves as a Newtonian particle with constant velocity, the fluctuations are not enough to bring the system to equilibrium, i.e., the dissipation, which for large times decays as 1/t, does not balance the fluctuations. This work points out that the violation of the mixing condition breaks down ergodicity, as required by the Khinchin theorem and the fluctuation dissipation theorem.
In conclusion, since the mechanism of diffusion is present in most nonequilibrium processes, diffusion is an exhaustive subject, and we have only called attention to some of its aspects. Consequently, we apologize to the authors of important works not mentioned here , such as aging [122,151,162,163,197] for example.

CONFLICT OF INTEREST STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AUTHOR CONTRIBUTIONS
FAO suggested the work, and wrote some sections; RMSF was responsible for the section on non-Markovian processes and selected the figures; LCL wrote the ergodicity and decay towards equilibrium sections; MHV collaborated on the section on the generalized Langevin equation and reedited the text.

ACKNOWLEDGMENTS
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001, CNPq and FAPDF. MHV was supported by a Senior fellowship (88881.119772/2016-01) from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) at MIT. MHV would like to thank Prof. Jeff Gore for the hospitality at Physics of Living Systems, MIT.