Identifying Arguments of Space-Time Fractional Diffusion: Data-Driven Approach

A plethora of complex dynamical systems from disordered media to biological systems exhibit mathematical characteristics (e


INTRODUCTION
Technological advances ranging from an impressive improvement in the sensing and computational resources to the enhancement in data storage play a prominent role in offering researchers' new directions to investigate unknown or poorly understood phenomena and boost numerous scientific areas (e.g., neuroscience, synthetic and system biology, finance, anthropology, and political sciences).This impact on all sciences is likely to persist with the booming advances in data science (DS), machine learning (ML), and artificial intelligence (AI) [1][2][3][4].Apart from the impressive contribution of AI in pattern recognition (e.g., images) and language processing, nowadays AI and data science are contributing to several developing scientific fields (e.g., the discovery of unwanted effects of drugs and drug repositioning [5,6]).For example, one can consider the problem of "Lamisil" drug used for the treatment of skin infections, which caused deaths and other liver reactions after being on the market for years.Although it was found that the drug is behind the TBF-A generation, which has toxicological effects, it was not possible to explain how the drug boosts the TBF-A formation.In 2018, a deep learning-based investigation identified the pathways leading to TBF-A formation [5].The process of TBF-A formation could not be identified before launching the product on the market due to its complicated mechanism and limited information on its manifestations.Such a discovery can have a primordial role in drug discovery and development by pharmaceutical industries and startups working in drug discovery (e.g., Kebotix, Deep Genomics).
Furthermore, AI has been recently considered as a useful analysis and discovery tool in medicine and healthcare.For instance, computer scientists and radiologists incorporated ML and AI techniques into the radiological examination in order to provide better results in medical imaging [7] and to detect early signs of diseases.Indeed, convolutional neural networks based approaches enabled the analysis of three dimensional MRI images and the identification of symptoms of Alzheimer's disease [8].Similarly, AI-based investigation provided new approaches for quantifying the risk of autism [9].It is also worth mentioning that AI has also been introduced in ophthalmology [10].Beyond the medical field, reinforcement learning has been recently used in meteorology in order to study the climate [11,12].Consequently, we are witnessing a paradigm shift in mining and understanding real-world problems, as a variety of late innovations are based on applying ML algorithms to analyze and characterize high-dimensional experimental data.
Despite the tremendous boost that is provided by the ML and AI for the analysis of static data through identifying the statistical interdependence between components of a system of interest, there is little to say about analyzing dynamical processes from big data and uncertainty quantification for large-scale complex systems.Specifically, ML has a limiting ability in deciphering the physical driving laws and governing equations from multi-modal heterogeneous, scarce, and/or noisy time-series data associated with complex systems exhibiting multi-scale and multi-physics spatiotemporal evolution.These multi-scale and multi-physics spatiotemporal characteristics that occur in physics, biology, chemistry, neuroscience, and even geology, are usually encoded through (fractional or integer order) partial differential equations (PDEs) with possibly uncertain parameters.These PDEs are derived from conservation laws on energy, momentum, or electric charge (e.g., diffusion equation, Maxwell's equations, Navier-Stokes equations, Schrodinger equations).However, a plethora of complex systems from biology, neuroscience, or finance have numerous hidden interaction mechanisms, and the derivation of the PDEs describing their evolution is unknown.In the big data era, we witness new opportunities for data-driven discoveries of potentially new physical phenomena and new physics laws (or rules).Consequently, one may ask the following fundamental question: Can we learn a PDE model from a given set of time-series measurements and perform accurate, efficient, and robust predictions using this learned model?This question has motivated researchers to develop methods for estimating PDE parameters using numerical solutions of PDEs [13,14] (which requires careful parameterizations and high computational cost), Bayesian approaches [15], and a twostate approach [16][17][18][19] where the parameters of the PDEs are estimated via least squares.However, exploiting the higher-order statistics of the measurements, which can characterize the rare events for a robust understanding of complex systems, and determining whether fractional or integer order PDEs together with their corresponding parameters govern the observations, has not been addressed.
Diffusion is one of the fundamental mechanisms used for analyzing the transport of particles, and a common example of a diffusion process is the Brownian motion.Chaotic motion of a particle characterizes the latter process, and it can be modeled by a random walk such that the mean square displacement follows the diffusing scaling < ( X) 2 > ∼ t (where < .> designates the mean).Furthermore, diffusion is a principal concept that explains many natural and scientific/technological phenomena (e.g., particles motion [20], DNA and cellular processing [21-24], microbial communities [25], brain activity [26], physiological complexity and cyber-physical systems modeling [27][28][29]), neuron spikes [30].The focus on analyzing complex systems led to studying anomalous diffusion [31][32][33][34][35][36][37][38][39][40][41][42][43] to decipher complex system properties (e.g., long-range memory, higherorder correlations, ergodicity breaking measured as a discrepancy between the long time-averaged mean squared displacement and the ensemble-averaged mean squared displacement).The anomalous diffusion has been shown to be able to describe complex fluid dynamics [44,45], biological systems [46][47][48], transport [49], dynamics in fractal structures [50][51][52], and economics [53].Contrary to random walks processes describing classical diffusion (e.g., Brownian motion), the particle possesses an internal memory that leads to a non-stationary motion, where the mean square displacement is heavy-tailed < ( X) 2 > ∼ t β (β is a parameter that is related to the memory of a particle).
The principal purpose for studying anomalous diffusion is to take into account complex/non-trivial behavior of the motion of particles usually found in transport processes in disordered and complex systems.From a phenomenological perspective, the anomalous diffusion can be better understood by recalling the assumptions made by Einstein [20] on defining the normal diffusion: the motion of the Brownian particles are independent (valid for small concentrations), there exist a small time scale during which the particle displacements are statistically independent (i.e., a Markovian behavior), and the particle displacements at these time scales correspond to a mean free path distributed symmetrically in positive or negative directions (i.e., a symmetric Gaussian statistical behavior).In contrast, anomalous diffusion generalizes the normal diffusion framework by removing one or more of such requirements on either Markovian or Gaussian behavior [42,43].In the literature, there are many methods used to analyze anomalous diffusion, mainly generalized Langevin equation [54,55], thermodynamics [56,57] and in this article, we concentrate our analysis on the discussion of incorporating fractional derivatives into PDEs to model anomalous diffusion.In fact, the mathematical description of anomalous diffusion involves a power law expression of the mean square displacement as a function of time.It often relies on fractional-order derivatives acting on either space or time components of a PDE [29,42,43].Indeed, modeling anomalous diffusion via fractional diffusion equation (i.e., PDE) can be provided by the master equation for continuous time random walk, and the solutions of the fractional diffusion equation can be interpreted as spatial probability densities evolving in time, related to self-similar stochastic process encoding the long-range memory property [58].For instance, the anomalous diffusion of a particle subject to an external nonlinear force and a thermal bath is described in Metzler et al. [59] through a fractional Fokker-Planck equation.Here, the timefractional Riemann-Liouville derivative models the long-range memory effects characteristic to anomalous diffusion in random environments and chaotic Hamiltonian systems [59].More generally, the anomalous diffusion has been successfully modeled by a space-time fractional diffusion equation that assumes that the process has memory (i.e., time-fractional derivative) as well as being non-local (space-fractional derivative).More recently, a comprehensive analysis of the higher-order moments associated with the amplitude fluctuations of the time-averaged mean square displacement for an anomalous diffusion model demonstrated that the skewness and kurtosis can improve the estimation of the anomalous diffusion exponent and can help at classifying the anomalous stochastic processes [60].
Despite the significant body of work on anomalous diffusion models, finding the exact parameters of the corresponding governing PDE is not a trivial task.In this context, given a spatiotemporal dataset, we aim to develop an efficient algorithm for estimating the parameters of the generalized fractional-order PDE that models the dynamics of the process under investigation.Consequently, by identifying these parameters, one can also investigate the physical rules modeled by the PDE.In the results section, we analyze two types of PDEs and discuss algorithmic approaches to determine their parameters from the time-series data.We also provide a simulation study where we verify the correctness and effectiveness of the proposed algorithmic approaches on deriving the exact parameters using synthetic trajectories generated from the PDE model.

Space-Time Fractional Diffusion Equation
The space-time fractional diffusion equation has been proposed in previous works as a mathematical model to analyze anomalous diffusion [34,[61][62][63][64][65].In a nutshell, the space-time fractional diffusion equation in (1) consists of a fractional Riesz-Feller derivative of the order α > 0 (space-derivative) that encodes the space variations and a fractional Caputo derivative of the order β > 0 (time-derivative) that measures the time variations.To better generalize, we also consider the skewness factor in the space derivative of the diffusion equation.Hence, the space-time fractional diffusion equation is defined as where the operators x D α θ and t D β * designate the fractional Riesz-Feller derivative of order α and skewness θ [66], and the Caputo time-fractional derivative of order β [67], respectively 1 .The parameter D denotes the generalized diffusion coefficient.The parameters α, β and θ satisfy the following constraints, 0 < α ≤ 2, 0 < β ≤ 1 and |θ | ≤ min{α, 2 − α}.
Given a set of time-series trajectories that record the evolution of particles or agents that exhibits anomalous diffusion modeled by Equation ( 1), without prior knowledge about the parameters of the space-time fractional diffusion equation, our goal is to use the dataset available to retrieve the exact fractional PDE that generates the given time-series.Toward this end, we develop a mathematical framework where the parameter and mathematical (operator) expression identification task is defined as a regression problem (Figure 1).Indeed, the regression is formulated as a least squares problem, where the minimization involves the theoretical and the empirical statistical (higher order) moments (i.e., specifically the absolute moments).The choice of the statistical moments for performing the regression is convenient because we could derive its closed form expressions just from the generalized fractional PDE given in Equation (1).For the given time-series data, X n (t), 1 ≤ n ≤ N, where N denotes the total number of trajectories, the time empirical moments are defined as follows where x δ denotes the signed absolute δ-th power of x and x δ = |x| δ sign(x).The time-dependent absolute moment of the data generated according to the fractional PDE in ( 1) is given by the following result.
Proposition 1.The time-dependent absolute moment of the order δ with 0 < δ < α is written as follows where Ŵ(•) designates the gamma function.
To find the parameters of the fractional PDE in (1), we rely on analyzing the higher order moments and minimize the quadratic error between the theoretical and the empirical absolute higher order moments.However, due to the nonlinear non-convex expression stated in the Equation ( 3), such a regression problem is non-trivial and it is non-trivial to provide the theoretical guarantees about the convergence of a multidimensional optimization algorithm used to solve the problem in one-shot.To tackle the non-convexity, we aim to approach the parameters estimation via multi-step optimization.For obtaining additional information from the data, we also derive the signed absolute as in the following result.
Proposition 2. The time-dependent signed absolute moment of the order δ with 0 < δ < α is written as follows . ( 4) We refer the reader to the Supplementary Materials for the proofs of the above Propositions.Using the above results, the estimation of parameters is detailed in the following section.

Parameter Estimation
Absolute Moments Approach: Starting from a dataset containing N independent trajectories (realizations of the Equation ( 1) with unknown parameters) sampled uniformly at times {t 1 , t 2 , • • • , t L }, we aim at estimating the actual parameters of the Equation (1) via a moments-based approach, i.e., determining the parameters using empirical moments and the theoretical expressions.Thus, the proposed scheme to find the parameters is mainly a two-step approach, regression over time on one hand and over space on the other hand.The method is summarized as follows.
For a given order δ, the log of absolute moments in Equation (3) varies as where C 1 does not depend on t.Using the estimated empirical moments from Equation ( 2), we replace the theoretical moments with the empirical values in the Equation ( 5).The parameter ratio β/α can then be estimated by performing linear regression of log(t l ) vs. log(M δ t l ) using a total of L points (i.e., l = 1, 2, • • • , L).It is worthwhile to note that the precision in estimation of β/α can be improved upon using multiple values of moment exponent δ for increasing the total diverse points in linear regression.For example, a set = {δ 1 , δ 2 , . . ., δ K } can be used for having K × L diverse points for linear regression by repeating the Equation ( 5) with different δ k .Hence, we can make a trade-off between space and time, i.e., enlarging the cardinality of when the available time-series are short, in other words when L is small.
Next, we use the results of Proposition 1 and 2 to get the ratio of E[|X(t)| δ ] and E[X(t) δ ] as the following ratio r .
Upon replacing the theoretical moments with the ones derived in Equation ( 2) we can invert the tangent function to have the ratio θ/α.In addition, although the ratio r is independent of time t, the empirical ratio of the data will possibly not be a constant across time.We therefore, replace r with time average of the ratio of S δ t l and M δ t l as S δ t l /M δ t l .From section 2.1, we know that the parameter θ is constrained as |θ | ≤ min(α, 2 − α), therefore, we have |θ/α| ≤ 1. Next, we define the following function Finally, the ratio of parameters θ and α can now be written as As argued before, to improve the precision of the estimation, we can add diversity by having a set of moment exponents = {δ 1 , δ 2 , . . ., δ K }.For each δ k ∈ (where k = 1, 2, • • • , K), we use the Equation ( 6) to obtain (θ/α) k , and finally obtain the estimated ratio of parameters as (θ/α) = 1 The absolute moments of order δ from Proposition 1 can be re-written as follows .
Therefore, for the given value of the order δ = δ k , the estimation of α and diffusion coefficient D can be written as the following non-linear equation To solve the non-linear equation in (7) for α and D, we use non-linear least squares method (trust region reflective method).
The input variables to the non-linear optimization are the values of order δ ∈ , where = {δ 1 , δ 2 , . . ., δ K }.Formally, the optimization problem is written as follows.
We note that the values of C k can be efficiently estimated by first performing the linear regression of t δ β α vs E[|X(t)| δ ] with the condition of zero intercept, and then substituting the ratio (slope of the linear regression) in Equation (7).The optimization problem in ( 8) is non-convex, therefore, the global optimum solution is not guaranteed by the solvers.Note that, in some scenarios, we may have a prior knowledge about boundaries of the parameter α (i.e., α min ≤ α ≤ α max ).Thus, we solve the constrained optimization problem.
Finally, the values of β and θ can be estimated upon estimating α, as we already have the ratios β/α and θ/α from Equation (5) and Equation (6), respectively.The approach described is summarized as Algorithm 1.
Remark: Note that for α = 2, when the order δ is close to the boundary values, the theoretical absolute moment goes to +∞.However, the empirical one is finite, so in order to have a small error associated to the estimated parameters, we choose the order δ to be far enough from the end points of allowed region.Notice that, the value of α is unknown while the condition − min{α, 1} < ℜ(δ) < α is dependent of it, so assuming an α min as lower bound for α is rational.We note that the Algorithm 1 is dependent upon the solution of non-linear nonconvex optimization problem in (8), and therefore, convergence to the global solution is not guaranteed.Also, we need to provide an input set such that the absolute and signed moments are computable for unknown α.The choice of has to be made by having some idea about lower bound of the α parameter.To take care of these two issues together, we next present an alternative approach in which we do not require non-linear optimization as well as do not require to have the knowledge of set.
Log Absolute Moments Approach: In this subsection, we rely on the moments of log absolute values of the trajectories.Similar to absolute moments with order δ, the log absolute moments can be computed as we present in the following results.
Proposition 3. The time-dependent expected log absolute value of X(t) is written as follows where γ is the Euler-Mascheroni constant.
Next, the variance of the log absolute values of the trajectories can be written as the following results.
Proposition 4. The variance of log absolute value of X(t) is written as follows var(log It is interesting to note that the variance is independent of time as well as a function of α and θ/α.We exploit this feature of the variance to obtain an estimate of the α with the ratio θ/α known.Hence, the need for performing non-linear optimization in Algorithm 1 is omitted.We also write the second moment of the log absolute values in the following result. Proposition 5.The time-dependent expected square of log absolute value of X(t) is written as follows , and γ is the Euler-Mascheroni constant.
The proof of the Propositions 3, 4, and 5 are provided in the Supplementary Materials.Using the above results for log absolute values, we now present the second approach to estimate the parameters of the space-time fractional PDE in (1).
We proceed similarly to the first approach of the δ order absolute moments, however, now equating the theoretical and The parameter ratio β/α is estimated by performing the linear regression of log(t) vs L (1)  t .The slope of the regression output is the estimated ratio (β/α).Next, the ratio of the parameters θ and α is estimated using the same approach as described previously in Equation (6).
We note that upon having an estimate of the parameter ratio θ/α, the variance is one-to-one function of α since α ≥ 0. Therefore, on substituting the value of θ/α in Equation (10) we compute the value of α.Finally, with α and β/α known, the value of diffusion coefficient is estimated from the intercept of the linear regression of log(t) vs L (1)   t as D. The above described approach is summarized as an algorithmic strategy in Algorithm 2. Calculate log absolute moments L (1)   t ⊲ Eq.12 4: end for 5: β α ← m, (m, c) being (slope, intercept) of linear regression L (1)  t vs log(t) It should be noted that the approach utilizing log absolute moments does not require a predefined set of order values .In addition, this does not suffer from the convergence issues as there is no non-linear non-convex optimization involved.
Note: The estimated parameters in both algorithms are not guaranteed to be optimal.For example, it is not straightforward to guarantee maximum likelihood sense as solving maximum likelihood involves solution to non-convex problem.We evaluate the efficiency of the both algorithms in the following section.

EXPERIMENTAL RESULTS
As we described previously, both algorithms depend mainly on the statistical absolute moments, statistical signed absolute moments and the expected log absolute value of the process X(t).For this reason, we first start by validating the theoretical expressions derived in Equations ( 3), ( 4) and (9).We consider different scenarios (normal diffusion equation, neutral diffusion equation [68][69][70][71], space diffusion equation [71], and time diffusion equation [71]).In these experiments, we generate synthetic data corresponding to N = 100 trajectories simulated according to the diffusion model under study with a generalized diffusion coefficient D = 1.Note that the data generation procedure is presented in details in the Supplementary Material.
In Figure 2, we present panel of 4 × 3 plots where we refer by a row the scenario considered (normal, time, space, neutral diffusion) and we plot the statistical absolute moments, statistical signed absolute moments and the expected log absolute value of the process X(t) for an order δ = 0.001 vs. time in the columns.The signed moment deviates a little from the theoretical expression for some scenarios, due to lack of sufficient samples.For the particular case of Figure 2E, we observe that there is nearly perfect match with the theoretical expression.The reason being, in this case, the parameters α = θ , hence the trajectories are generated from a negatively skewed alpha-stable distribution, therefore X(t) ≤0 ∀t.We note that a similar situation happens when we have the parameters α = −θ .In this case, we have a positive skewed alpha distribution, and therefore, X(t) ≥ 0 ∀t.In all scenarios, we can observe that the empirical statistical moments match perfectly the theoretical ones in all scenarios.This result confirms our theoretical derivations and motivates us to move forward with this approach.

Parameters Estimation of Synthetic Data
In this experiment, we validate the proposed approach using artificially generated spatiotemporal data according to the PDE model presented in Equation (1).More precisely, we use the  type of diffusion considered and columns 1, 2, 3 and 4 designate the parameters α, β, θ and D, respectively.In each of the subfigure, we plot the estimated parameter using both algorithms (blue line for Algorithm 1 and the black line for Algorithm 2) vs. the number of trajectories considered during the estimation process.We also plot the true value as a red line, and a narrow interval around the true value using black dashed lines.The presented blue and gray shaded regions represent the standard deviation for the estimated parameters associated to Algorithm 1 and Algorithm 2, respectively.All sub-figures in Figures 3-5 show that the proposed schemes are doing well in all scenarios where we are able to retrieve the exact set of parameters α, β, θ and D with a small/negligible error 2 .Also, we can see how the standard deviation of the estimated parameters decreases as the number of trajectories increases.Furthermore, we can remark that Algorithm 2 is performing slightly better than Algorithm 1 2 Additional simulations for other scenarios are presented in the Supplementary Material.
in terms of rate of convergence.Although the variance is quite high when fewer trajectories are considered, we remark that in some scenarios we can get good estimates of the parameters even with reduced number of trajectories.
In Tables 1-3, we provide further details about the results we have described in Figures 3-5, respectively.Indeed, for both algorithms, we report the estimated value of each parameter, in The bold values are used to highlight the best performance for each case across the 2 algorithms.
the column entitled the final value, of the space-time fractional diffusion equation (i.e., α, β, θ and D) and the final relative error, which is defined as the percentage of the discrepancy between the exact value and the estimated one, or ε ξ = | ξ −ξ | |ξ | × 100, where ξ and ξ are any estimated and original parameters, respectively.
We also report the parameter N (i)  90 , which we define as the minimum number of trajectories beyond which the accuracy using algorithm i is at least 90% and formally can be written as where ξ ∈ {α, β, θ , D} and A i ({X n (t l )} N n=1 ) denotes the output of the i-th Algorithm (i ∈ {1, 2}).By looking at the values reported in the column N (i) 90 , we can analyze the performance of both algorithms.For instance, we can remark that a dozen of trajectories is sufficient for both algorithms to achieve at least 90% accuracy in most of the settings.We note that a missing value for N (i)  90 could be an indicator of either a bias in the Algorithm i which is greater than 10%, or the situation that the maximum considered trajectories (N = 10 4 ) in our experiments are not The bold values are used to highlight the best performance for each case across the 2 algorithms.
sufficient enough for achieving accuracy more than 90%.Lastly, it is worth to note that increasing the number of trajectories will lead to a smaller final relative error.

Parameters Estimation of Fractional Brownian Motion
In this experiment, we present a study of fractional Brownian motion (fBm) process.More precisely, we consider a dataset of trajectories generated using the fBm model, then we parameterize the data according to the fractional diffusion equation under study in order to estimate the parameters of the fBm process generating the data (i.e., the Hurst exponent H and the diffusion constant D 0 ).From the Langevin equation associated with the fBm, we can remark that the fBm exhibits a long-time correlation which makes the process non-Markovian.The effective Fokker-Planck equation is given as The solution for the aforementioned equation is provided in Wang and Lung [72]  The bold values are used to highlight the best performance for each case across the 2 algorithms.
Thus, the fractional absolute moment of a fBm process with a parameter 0 < H < 0.5 is given as Based on the expressions (3) and ( 15), we can use the spacetime fractional diffusion equation under study to parameterize an fBm process.Indeed, by mapping the two expressions, we get α = 2, β = 2H, θ = 0 and D = D 0 2 .In this experiment, we assume that the given trajectories are following an fBm model and the task is to apply the proposed algorithms to identify the parameters associated with the fBm (H and D 0 ).
Last, in Figure 6, we present the simulation results associated with this experiment.The figure represents a panel of 4 × 4 plots where a row is associated with a set of parameters H and D 0 associated to a fBm process, and columns 1, 2, 3 and 4 are associated with the estimation of the parameters α, β, θ and D, respectively.Hence the estimation of the parameters H and D 0 .In these simulations, we generate N trajectories following four different fBm models using the built-in function wfbm in Matlab (i.e., in Figure 6, rows 1, 2, 3 and 4 are associated to fBm with Hurst exponents H = 0.25, H = 0.3, H = 0.35 and H = 0.4, respectively).As we can observe, Algorithm 2 (plotted using black line) can efficiently provide accurate estimates of the Hurst exponent and the diffusion constant and it performs better than Algorithm 1.The reason behind this is that Algorithm 1 requires solving a non-linear non-convex optimization problem which is not the case for the second algorithm.Also, we remark that finding the parameters of the fBm using Algorithm 2 does not require very large number of trajectories.In fact, we can determine an accurate estimates of the parameters (α, β) with an error close to 2% by performing Algorithm 2 on dataset of few trajectories obeying the fBm model (i.e., N ≤ 10).
From the simulation results related to fBm in Figure 6, we can see that Algorithm 2 is capable to accurately estimate the parameters α and β = 2H hence H.However, the estimation of the parameter D 0 is not accurate because of the nonconformity of the equation under study with the fBm model for all parameters values.Indeed, both models coincide, in terms of moments, only when α = 2, θ = 0 and β = 2H.Interestingly, in the current experiment, we focus more on determining an accurate estimate of the parameter H that is related to the long-time memory of an fBm process.Therefore, the proposed algorithms can be used as a tool to quantify the memory of an unknown process that follows the fBm model from given trajectories.
In Table 4, we report further elaboration of the results in Figure 6.In fact, we can remark that all the bold marked minimum relative errors are in the column associated with Algorithm 2, which endorse our previous claim about the effectiveness of the second algorithm.Related to the discussion of the estimation of α and β, we can see that the column N (1)  90 has a lot of missing values, and this indicates that using Algorithm 1 for these experiments lead to a final relative error that is greater than 10%.More interestingly, we can remark that with just around 10 trajectories, Algorithm 2 can provide an estimate of either α or β that is within the 10% error margin.

DISCUSSION
Understanding complex dynamics remains a challenging task when their generative model is unknown.This task is more complicated when it comes to analyze spatiotemporal kinetics and infer the model that dictate their evolution.Although, the physics that drive the dynamics are unknowns, datadriven based approaches are prominent tools to discover the physical laws/rules governing complex observed dynamics (from heterogeneous, sparse, scarce or even noisy data).Indeed, such a discovery plays a crucial role in diverse fields ranging from system biology, neuroscience, econophysics to social studies.Toward addressing this goal, in this manuscript, we have considered a generalized space-time fractional PDE and have developed an effective, rigorous and robust algorithmic strategies to estimate the parameters and so identify the main mathematical operators appearing in the PDE.In contrast to prior work, we investigated the effectiveness and robustness of the proposed algorithmic approach for estimating the correct parameters as a function of the available number of trajectories.From our simulation results, we observe that for all considered types of diffusion models except the classical one (i.e., all combination of the parameters α, β, θ , and D), a few number of recorded time-series (less than 100 trajectories) is required to attain the correct estimation of the PDE parameters with less then 2% confidence interval.For the case of the normal diffusion (i.e., except for the case α = 2 and β = 1), we may need more trajectories to achieve similar accuracy.Therefore, we hope that the proposed algorithms will help the community to better analyze complex spatiotemporal data, in order to unravel new physical laws in different applications (social networks, neuroscience, etc.) and decipher the causal interdependence between different processes.
Furthermore, we performed a study on the properties of fBm processes.We can remark from the Fokker Planck equation The bold values are used to highlight the best performance for each case across the 2 algorithms.
associated with the fBm process, mentioned in the manuscript, that the effective diffusion coefficient is time-dependent.Even though, this type of equation does not perfectly match the class of fractional diffusion equation that we are dealing with in this work, we applied the proposed algorithms to a dataset of trajectories generated according to the fBm model to retrieve the Hurst exponent and the diffusion constant.Indeed, we were able to identify these parameters since the absolute and signed absolute moments present similar structure as the ones calculated for a process that is generated according to the fractional diffusion equation under study.Therefore, we provide a new/alternative approach to quantify the memory of a fBm process.Note that without knowing the model that governs the dynamics of the process and using this framework, we can estimate the parameters α and θ to be equal to 2 and 0, respectively, but in the current stage we are not able to confirm that the data is generated either from a spacetime fractional diffusion equation with a constant generalized diffusion coefficient or from a fBm process with time varying diffusion coefficient.In future work, we will push forward the analysis to identify whether the diffusion coefficient is timedependent or it is a constant, and thus to differentiate between the space-time fractional diffusion equation and the Fokker Planck equation associated with the fBm process.
This mathematical formalism can be further developed and generalized to include additional operators and take into account advection phenomena as well as combined with other advanced statistics and information theory inspired methods to discriminate among various mathematical expressions (operators) in order to either identify the dominant physical phenomenon (or rule) governing the measurements or to determine the degree to which multiple physical laws contribute to the observed dynamics.Also, analyzing noisy data originated from real world applications will be taken into account in order to cope with complex scenario.We plan to build on these grounds, enrich the mathematical formalism and contribute to a significant paradigm shift in the context of data-driven discovery architectures of physical phenomena as well as enabling accurate predictions concerning complex evolving systems without requiring to know the regimes of variation for parameters, the types of mathematical operators or the fact that the data should be sampled at a particular level.

FIGURE 1 |
FIGURE 1 | Flowchart showing an AI architecture used to uncover hidden patterns governing complex phenomena: a data-driven based approach to estimate the parameters of the space-time fractional diffusion equation from spatiotemporal dynamics.

FIGURE 2 |
FIGURE 2 | The time-dependent theoretical and empirical absolute moments and signed absolute moments of order δ = 0.001, the time-dependent theoretical and empirical expected log absolute for the following four types of diffusion models: (A-C) Normal diffusion equation, (D-F) Neutral diffusion equation, (G-I) Space fractional diffusion equation, (J-L) Time fractional diffusion equation.
above-mentioned schemes (Algorithm 1 and Algorithm 2) to retrieve the parameters (α, β, θ and D) used during the data generation step.Figures 3-5 summarize several experiments done for different diffusion models (classical, neutral, space, and time diffusion), where we assume a set of combination of α, β and θ parameters for a generalized diffusion coefficient D = 1, 2 and D = 5, respectively.In each figure we present a panel of 4 × 4 different plots where a row represents the

TABLE 1 |
Numerical results for fractional diffusion parameter estimation with D = 1.

TABLE 2 |
Numerical results for fractional diffusion parameter estimation with D = 2.

TABLE 3 |
Numerical results for fractional diffusion parameter estimation with D = 5.

TABLE 4 |
Numerical results for fBm parameter estimation.