Entropy-Based Pandemics Forecasting

A great variety of natural phenomena follows some statistical distributions. In epidemiology, such as for the current COVID 19 outbreak, it is essential to develop reliable predictions of the evolution of an infectious disease. In particular, a statistical projection of the time of maximum diffusion of infected carriers is fundamental in order to prepare healthcare systems and organize a robust public health response. In this paper, we develop a thermodynamic approach based on the infection statistics related to the total citizenry of a country. It represents a novel tool for evaluating the time of maximum diffusion of an epidemic or pandemic.


INTRODUCTION
In the natural, social, economic, and physical sciences a large variety of phenomena are characterized by regularities, which can be analytically described by a defined statistical distribution [1]. Consequently, in any field of research, scientists, and engineers have always taken attention to find the best statistical distribution to predict the systems behavior. This is particularly true in epidemiology. Indeed, epidemics can occur in a community or region by causing illness in excess of normal expectancy; pandemics are no more than a large-scale global epidemic which determine a growth in morbidity and mortality over a wide geographic area [2,3].
Some recent examples of pandemics are the 2003 SARS (Severe Acute Respiratory Syndrome), the 2014 West Africa Ebola epidemic, and the present COVID-19 caused by the coronavirus SARS-Cov-2. Moreover, epidemics and pandemics can cause also significant, widespread economic hardship and potentially lead to social unrest. Consequently, the interest in forecasting the diffusion of such global infectious disease threats is continuously increasing [2,4,5].
To implement effective public health measures in a timely manner and allocate scarce resources according to geographic need, it is very important to forecast the diffusion or spread of the infection amongst the population. Consequently, it is fundamental to develop a reliable analytical approach that allows such predictive modeling.
Traditionally, epidemiological analyses are based on sigmoidal models, which indeed are useful if the evolution of the epidemics follows well-established patterns. However, especially in the beginning of any epidemics we have only partial access to validated data also because the number of infected people is still rather small and follows a dynamic process. Scientists and engineers have always searched for the best statistical distribution useful to predict the behavior of the systems under consideration [6,7]. Indeed, the usual statistical approach is based on the Kolmogorov's law of large numbers which requires the existence of the first finite moment, and the Lyapunov's version of the central limit theorem assumes an existence of the finite moment of an order higher than two. But, when the data are collected by a heavy-tailed distribution, the mathematical bases of the usual statistics is not satisfied. The existence of specific finite moments is closely related to the concept of a tail index, and its estimation is one of key problems in statistics. At present, there are a great number of estimators of the tail-index [8][9][10][11][12][13][14][15][16], but, a generic approach is required in order to generalize the statistical approach to complex systems, such as in the case of epidemics or pandemics.
Furthermore, the spread of infection can be studied as the evolution of an open thermodynamic system. In this context, we note that Jaynes developed a non-equilibrium statistical mechanics approach for the stationary state constraint, on the basis of the principle of maximum entropy [17,18]. He maximized the Shannon entropy for information in relation to the pathway followed in the thermodynamic phase space, by considering the probability subject to the actual constraints [19]. This results in finding the most probable macroscopic pathway realized by the greater number of microscopic paths compatible with the imposed constraints [19][20][21][22][23][24]. Entropy has been proven to represent a fundamental key for the analysis of some biosystems [25][26][27][28][29][30][31][32].
In this paper, we therefore extend a thermodynamic approach of complex systems to the analysis of epidemics by introducing entropy as a tool to predict the evolution of an infectious disease.

MATERIALS AND METHODS
First, we must consider a reference statistics for a human to be infected. To do so, we consider the recent results obtained in relation to the use of the logistic approach by Loum et al. [33]; the cumulative probability of infection vs. time t follows the logistic shape: where P is the probability of infection, t is the time, α and β are two constants. The shapes of SARS-Cov-2 expansion for China, USA, Italy and Spain are shown in Figure 1.
On the other hand, in relation to the probability of infections, by following the usual statistical thermodynamic approach, we can define the Gibbs dimensionless entropy as [34,35]: where f is the frequency of the infected people on the total citizenry of the country considered: where n(t) is the number of infected people at the time t and n tot is the population at the time t.
For any system, the most probable state is reached when the entropy (Equation 2) reaches its maximum, so, in relation to epidemics/pandemics, we expect that the maximum diffusion or expansion of the infectious disease occurs at the maximum entropy.

RESULTS
Entropy is a function which allows us to determine the time of maximum diffusion or spread of the infections. In order to use such a thermodynamic approach, we must obtain medical data, usually collected by the health authorities. However, at the start of outbreak, available data are rare, and so we can obtain only a tail shape of the entropy function; still, we must try to obtain a best fit of the entropy shape by using at least 5-8 days of observational data to evaluate the interpolation function by a tail Taylor power development [36,37].
Once we are able to obtain the function fitting the entropy shape vs. time, we can forecast the maximum of the entropy and, consequently, the corresponding time point of maximum infections amongst the citizenry. In summary, the epidemiological forecasting tool that we suggest consists in: • Finding the occurrence frequency distribution in time; • Finding the cumulative value of the occurrence frequency distribution in time; • Evaluating the entropy through the Equation (2); • Evaluating the best fit for the entropy obtained at the previous point; • Determining its maximum and the related time, directly by the shape or by mathematical methods [36,37].
To demonstrate the utility of the model, we have represented the shapes of the evolution of entropy for the USA, China, Italy, and Spain in Figure 2 (using the data summarized in Table 1); depicted is the interpolation function that is used to evaluate the maximum entropy which in turn relates to the time of maximum SARS-Cov-2 infection among a countrys citizens. We can highlight that: • April 28th, which is prospective at the time this manuscript has been submitted; • As an example that this approach also has applicability at a higher spatial granularity, for New York City ( Table 2 and Figure 3), for instance, it results 38 days after March 17th, i.e., around April 25th, also prospective at this point.

DISCUSSION
The method suggested here is a novel thermodynamic approach for forecasting large-scale infectious disease outbreaks based on   the maximum entropy variation, obtained by using an occurrence frequency approach for a finite size statistical population. There are some thermodynamic applications to epidemiology, but, in comparison to the approach introduced here, these previously reported concepts are based on the SIS dynamic model and on maximum entropy [38]. While generally intriguing from a mathematics perspective, these models are strongly dependent on the statistics used because the basic reproduction number introduced is a valid predictor in structured populations only Frontiers in Physics | www.frontiersin.org when size is infinite [39], which represents the usual constraint of a great number of statistics. We have therefore developed an approach based on fitting of the entropy in order to obtain its empirical-like approximation of the spontaneous occurrence of epidemics/pandemics. In this way, we analytically describe the expansion of an infectious disease without introducing any a priori statistics. In relation to other non-statistical-based thermodynamic models [39,40], we refrain from introducing any variables or rate evaluation, and we only fit the Gibbs entropy shape; as such, we obtain the real empirical behavior, as it unfolds, without any restriction related to a mathematical model, as introduced in the other approaches [39]. We note that our approach, much like any other, depends on the availability of reliable diagnostic testing which has been heterogeneously deployed across countries and regions with regards to test modality, availability and accuracy; still, while better test performance and the forthcoming availability of longitudinal data through ongoing population studies in the EU and the US would be desirable, based on currently available data, regardless of their limitations, our model already accurately predicted the date of maximum expansion of coronavirus infections in countries such as Italy and Spain.
In conclusion, we have obtained a novel, useful tool to aid much needed projections in large-scale infectious disease outbreaks, based only on an applied physical approach. Most importantly, the utility of the model has been confirmed in the context of the current COVID-19 pandemic caused by SARS-Cov2.

Resource Identification Initiative
To take part in the Resource Identification Initiative, please use the corresponding catalog number and RRID in your current manuscript. For more information about the project and for steps on how to search for an RRID, please click http://www.frontiersin.org/files/pdf/letter_to_author.pdf.

Life Science Identifiers
Life Science Identifiers (LSIDs) for ZOOBANK registered names or nomenclatural acts should be listed in the manuscript before the keywords. For more information on LSIDs please see Inclusion of Zoological Nomenclature section of the guidelines.

AUTHOR CONTRIBUTIONS
UL: conceptualization, methodology, formal analysis, investigation, writing -original draft, writing -review & editing, supervision, project administration and funding acquisition. GG: resources, writing -review & editing, visualization, data curation and validation. TD: conceptualization, methodology, investigation, writing -original draft, supervision, and writing -review & editing. All authors contributed to the article and approved the submitted version.