Analysis and prediction of improved SEIR transmission dynamics model: taking the second outbreak of COVID-19 in Italy as an example

This study aimed to predict the transmission trajectory of the 2019 Corona Virus Disease (COVID-19) and analyze the impact of preventive measures on the spread of the epidemic. Considering that tracking a long-term epidemic trajectory requires explanatory modeling with more complexities than short-term predictions, an improved Susceptible-Exposed-Infected-Removed (SEIR) transmission dynamic model is established. The model depends on defining various parameters that describe both the virus and the population under study. However, it is likely that several of these parameters will exhibit significant variations among different states. Therefore, regression algorithms and heuristic algorithms were developed to effectively adapt the population–dependent parameters and ensure accurate fitting of the SEIR model to data for any specific state. In this study, we consider the second outbreak of COVID-19 in Italy as a case study, which occurred in August 2020. We divide the epidemic data from February to September of the same year into two distinct stages for analysis. The numerical results demonstrate that the improved SEIR model effectively simulates and predicts the transmission trajectories of the Italian epidemic during both periods before and after the second outbreak. By analyzing the impact of anti-epidemic measures on the spread of the disease, our findings emphasize the significance of implementing anti-epidemic preventive measures in COVID-19 modeling.


Introduction
The COVID-19 pandemic has emerged as a significant global public health crisis, causing severe inflammation of the lungs due to a type of coronavirus.The virus's high transmission rate, severe infection outcomes, and unpredictable epidemic timeline have posed an ongoing threat to human health, causing significant damage to the global economy.As stated by WHO Director-General Dr. Tam Desai, this pandemic is a once-in-a-century health crisis, and its ramifications will last for decades.Therefore, understanding the epidemic's spreading mechanism (1, 2), analyzing the impact of anti-epidemic measures on its spread (3)(4)(5), and predicting its development trend and turning point have become critical issues (6)(7)(8).
Since the outbreak of COVID-19, modeling and analyzing the spread of the disease has become a significant area of focus.Common infectious disease models can be classified into SI, SIS, SIR, SIRS, and SEIR models depending on the specific disease's characteristics.While some researchers have used the SIR model to analyze the epidemic situation in various countries (9)(10)(11), it is worth noting that COVID-19 patients have an exposed period and are infectious, making the SEIR model a more appropriate choice for analysis.Cao (12) developed an enhanced SEIR model that incorporated measures such as medical tracking and quarantine, based on limited data available in the early stages of the COVID-19 outbreak.In Wang's model (13), infected individuals were classified into symptomatic and asymptomatic categories, and a network model was established to predict when Wuhan and its surrounding areas could resume economic activities.Tang (14) incorporated the contact rate and diagnosis rate dynamics into their model parameters.Li (15) estimated the proportion of symptomatic and asymptomatic infected individuals in China before January 23, as well as the ratio of symptomatic to asymptomatic infections.Zhang (16) introduced time-lag components and established a multi-stage time-delay dynamic model to investigate the impact of transportation on COVID-19 transmission.Zhong (17) developed a novel coronavirus pneumonia model based on transportation system dynamics and concluded that transportation has a positive feedback effect on the spread of COVID-19.Bag (18) utilized spatial statistical analysis to investigate the temporal and spatial patterns of COVID-19 transmission in India.Some researchers have employed advanced deep learning algorithms to construct artificial intelligence models for predicting the epidemic's development (19)(20)(21).However, classical SEIR model parameters are fixed and may not accurately reflect the current epidemic situation's evolving trend.As a result, due to changes in government anti-epidemic measures, improvements in medical care, and advancements in detection capabilities over time, dynamic model parameters are essential (22).Through extensive research and analysis of the current SEIR model based on COVID-19, it has been found that while most studies are capable of short-term predictions within a specified period (23,24), there is a lack of effective and accurate long-term prediction and anti-epidemic models.To address this issue, this study takes a comprehensive approach by considering various factors such as epidemic transmission characteristics (25), intervention measures (26), detection capabilities, and others, to enhance the classical SEIR model.Specifically, this model incorporates asymptomatic or mildly symptomatic infections and considers medical track quarantine, quarantine treatment, and other relevant measures in its construction.In order to capture the reality and complex dynamics of epidemic transmission more effectively, this model also incorporates distributed delays to define parameters such as contact rates and testing capabilities.Considering that certain parameters of the improved SEIR model require dynamic adjustments, regression algorithms and heuristic algorithms are utilized to calibrate these parameters, thereby enhancing the robustness of the model.The Italian epidemic (27,28) experienced a second outbreak in August 2020.A survey by the Italian Business Association found that 21.1 million Italians went on holiday, accounting for 40% of the total population, which may have contributed to the epidemic's resurgence (29).Thus, we analyze the Italian epidemic in two stages, with August 1st, 2020 as the cut-off point.The first stage pertains to the initial wave of the epidemic, while the second stage relates to the rebound after the resurgence.In this study, official announcement data from February to September 2020 was utilized.For each stage, a portion of the data was properly allocated as a training set to perform model parameter inversion.The data not included in the model training process was used to evaluate the predictive performance of the model and predicted the endpoint of the epidemic under these preventive measures.Additionally, sensitivity analysis experiments were conducted on the epidemicrelated parameters in the model to analyze the impact of anti-epidemic measures on the spread of the disease.

Model assumptions
It is essential to consider the divergence between the actual situation and the model.Therefore, this study incorporates the transmission characteristics of COVID-19 and makes several assumptions regarding the model, which are as follows: 1. Patients in the incubation period are able to infect susceptible individuals, while recovered patients are immune to reinfection due to the presence of antibodies.Furthermore, asymptomatic infections have a low mortality rate, and therefore, are not taken into consideration at present.Additionally, it is assumed that patients in quarantine areas are not infectious.2. We have chosen the scenario of "closing the city" in China's Hubei province as a template for our model.It should be noted that this model is structured as a closed node and does not account for population inflow/outflow or birth/death rates.While these factors may have an impact on the spread of COVID-19, they are not considered within the scope of this study.3. Firstly, it is assumed that during the early stages of the outbreak, all residents complied with the government's call to stay at home and wear masks while outside.Secondly, while most previous studies define the contact rate as a constant parameter, this does not accurately depict the dynamic implementation of antiepidemic measures in different countries.It is recognized that not all infected individuals can be promptly detected.
Considering that distributed delay plays a crucial role in capturing the time lag between infection, symptom onset, and subsequent transmission.Incorporating distributed delay in SEIR models helps to account for the variability observed in the incubation period, the time from infection to the development of symptoms, as well as other factors that influence disease progression and transmission (30).Therefore, Main novelty of this improved SEIR model is its ability to incorporate changes in contact rate c, detection capability T, and recovery rate through time varying deterministic and stochastic assumptions.
The specific expressions utilized are outlined below: In the equation, ∂ represents the maximum contact rate at the initial stage, while b denotes the minimum contact rate.Due to successful implementation of prevention and control measures, the likelihood of infected individuals coming into contact with susceptibles is significantly reduced, thus we assume that b = 0.The coefficients d 1 and d 2 , on the other hand, refer to the rate of change in detection capability.
During an epidemic, variations in medical treatment and human immunity can cause fluctuations in the daily recovery rate.In order to account for these fluctuations, we calculate the daily recovery rate using the following equation: R (t) represents the number of individuals who have recovered on a given day, while R (t-1) denotes the number of individuals who had recovered the previous day.Additionally, I (t-1) is the number of existing patients on the previous day.
To calculate the relationship between the daily recovery rate and time during an epidemic, we utilize Eq. ( 3) to determine the actual daily recovery rate.Next, we establish the following regression equation: The regression coefficients ε 0 , ε 1 , ε 2 are calculated by gradient descent method.

Improvement of SEIR model
In the traditional SEIR model, there are four categories of individuals: susceptible (S), exposed (E), infected (I), and recovered (R).However, this model has limitations in accurately simulating the actual epidemic spread.To address these limitations, we have improved the model by dividing the infected (I) category into two subcategories: asymptomatic infected (A) and symptomatic infected (I), as well as hospitalized (H) and deceased (D).Additionally, given the government's measures to track the contacts of infected individuals, we further divided those in close contact with infected individuals into two subcategories: quarantined susceptible (S q ) and quarantined exposed (E q ).
In our improved model, we define q as the track quarantine ratio, β as the probability of infection, and c as the contact rate.Close contacts are quarantined at a rate of q, and if they become infected, they are transferred to the quarantined exposed (E q ) category at a rate of βcq .Otherwise, they will be transferred to the quarantined susceptible (S q ) category at a rate of 1− ( ) β cq .Quarantined exposed individuals (E q ) are then transferred to hospital quarantine treatment at a rate of δ q , while quarantined susceptible individuals (S q ) are removed from quarantine and moved into the susceptible (S) category at a rate of λ .The population transformation relationships are illustrated in Figure 1.
An improved differential equation of SEIR transmission dynamics is as follows: Parameter settings are shown in Table 1.

Parameter estimation
Since the improved SEIR model relies on multiple parameter specifications provided in Table 1, with some parameters being dynamic and exhibiting significant variations across different states, the crucial task is to calibrate the relevant parameters within the model (31, 32).The primary objective is to ensure that the improved SEIR model is suitable for data from any given state.To achieve this goal, we utilize both regression algorithms and heuristic algorithms to estimate the model parameters based on actual data from February to May 2020.

The regression algorithm fits the daily recovery rate
The regression algorithm is a widely used supervised learning algorithm in machine learning.In our study, we employ the gradient descent algorithm to solve for the regression coefficients and identify the nonlinear relationship between the number of days and the daily recovery rate.Specifically, we assume that the functional relationship between time and recovery rate can be represented by the following equation: Within our equation, t represents the number of days, and  0 ,  1 , and  2 denote the regression coefficients.These coefficients are used to estimate the nonlinear relationship between time and the daily recovery rate.
In our study, we utilize the mean square error criterion to establish an expression for the loss function.This criterion is defined as the average of the squared distances between the predicted values and the actual values.By employing this criterion, we can accurately measure the deviation between predicted and actual values in our model.
Where J ε ( ) is the loss function, Y t ( ) is the sample observation value, γ t ( ) is the prediction value, and k is the number of data sets.
We derive the partial derivation of the loss function J ε ( ): The P j ( ) is the independent variable corresponding to the j coefficient.
Then we initialize ε randomly, and then iterate along the direction of negative gradient, so that the updated ε makes J ε ( ) smaller.The equation is as follows: Where α is the learning rate.When ε j drops to a certain point or a defined minimum value, it stops descending, and substitutes the ε j into the loss function to get the minimum value.The regression coefficient is estimated and the regression equation of γ is obtained.The fitting results are shown in Figure 2.
We calculated that γ is the overall recovery rate, but the recovery rate of the infected (I) and the infected (A) is different in reality.Therefore, we give reasonable assumptions based on the fitting effect of the model.

Heuristic algorithm to obtain model unknown parameters
Heuristic algorithms are applied to numerous real-world problems, allowing for the approximation of effective solutions in situations where guaranteeing the optimal solution is not feasible.Using heuristic-based models has contributed to a better understanding of the transmission trajectory of epidemics and increased the reliability of simulations (33).In this study, the Genetic Algorithm (GA) was utilized to calibrate the relevant parameters within the model.The objective of the algorithm is to optimize specific state-dependent parameters in order to enhance the agreement between the model and data.Each set of parameters for the SEIR model, along with the corresponding results, is referred to as an "agent." Each agent possesses a unique input set of state-dependent parameters, which serves as its genome.The genome is comprised by a list of the state-dependent parameters: initial maximum contact rate ( ∂ ), infection rate ( β ), track quarantine ratio ( q ), death rate of symptomatic infection ( α I ), death rate of patients in quarantine treatment ( α H ), the ratio of the propagation capacity of exposed to infected (υ ), and the regulator of infectious probability in asymptomatic infected ( θ ).
In order to optimize the parameters using a genetic algorithm, the agents' performance is evaluated based on their "fitness, " which measures how well the agent matches the available data.In this study, fitness is determined by comparing the SEIR model's results with the available data on cases, recoveries, and deaths.The fitness value is calculated using Eq.(11).
Z 1 is the official daily number of cumulative infected cases, Z 2 is the number of cumulative recovered cases, Z 3 is the number of cumulative deaths cases.
Specifically, we set value ranges and sampling intervals for the unknown parameter set ω = ∂ { } , q, , , , , H β α α υ θ I and utilized a genetic algorithm to randomly sample the parameters.The sampled values were then brought into the modified SEIR model, generating 2000 Monte Carlo simulations from the model.Using simulated data of the parameters as samples, we obtained the 95% confidence interval as the new sampling interval.Following further sampling using the genetic algorithm within the revised parameter intervals, the optimal value of the parameter set ω is revealed in Table 2.

Model verification
In this study, we utilized China's Hubei anti-epidemic methods as a reference to analyze and predict the second outbreak of the Italian epidemic.Due to the gradual stabilization of the COVID-19 situation in Italy in July 2020, this study first simulated and predicted the hypothesis that there will be no second outbreak of the COVID-19 in Italy.The cases recorded from February to May 2020 were selected as the training set to calibrate the parameters of the improved SEIR model, while the cases from June to July were used as data not used for model fitting to evaluate the predictive performance of the SEIR model.The analysis results, as depicted in Figure 3, indicate that in the absence of a second outbreak, the epidemic was largely under control by late July 2020.Based on our model prediction, the peak number of cumulative infected cases would be approximately 250,000 (95%CI: 218,000-287,000), the peak number of cumulative recovered cases would be around 210,000(95%CI: 186,000-239,000), and the peak number of cumulative deaths would be approximately 36,000(95%CI: 31,700-41,600).Furthermore, since February 24, 2020, the actual daily cumulative number of reported cases in Italy has been mostly within the 95% confidence interval of the model simulation results, highlighting the accuracy of our simulation and prediction.
Assuming that August 1st, 2020 marked the beginning of the second outbreak, we utilized the above model to conduct simulations from that day onwards.Due to intensified government anti-epidemic measures and changes in public perception compared to the initial phase of the epidemic,we refitted the model parameters using actual cases collected in August 2020.The cases collected in September were used as data not used for model fitting to evaluate the predictive performance of the SEIR model.The corrected parameters were then incorporated into the model.As illustrated in Figure 4A, according to this simulation model, the Italian epidemic situation would be largely under control by late October 2020, with the number of cumulative infected cases peaking on November 1st, 2020.Additionally, Figures 4B-D demonstrate the absolute error existing between the simulated data and actual data.
Based on the simulated and predicted results of the epidemic during the two mentioned time periods, the improved SEIR model eventually converges to a stable equilibrium state.This implies that as time progresses, the population sizes of each group reach a steady value, with no significant changes occurring.In this equilibrium state, the spread of the infectious disease is balanced with the rate of recovery or immunity, resulting in either the cessation of transmission or maintaining it at a low level within the population (34).This also indicates the existence of global solutions within this model (35).

The impact of anti-epidemic measures on the spread of the epidemic
To analyze the impact of anti-epidemic measures on the trend of the epidemic, we employ sensitivity analysis experiments within our model.Specifically, we observe the development of the epidemic in Italy since August 1st, 2020 through changes in the track quarantine ratio q.As illustrated in Figure 5, reductions in the proportion of track quarantine to 0.8q and 0.6q resulted in increases in the number of cumulative infections.Conversely, when the proportion of track quarantine was increased to 1.2q and 1.4q, the number of cumulative infections decreased accordingly.These findings demonstrate that strict medical track quarantine is an effective means of preventing the spread of the epidemic.As government interventions on the epidemic become more stringent, residents are encouraged to stay at home, reduce their number of trips, and practice personal protection measures.This results in a decrease in the dynamic variable for the contact rate of infected individuals, as shown in Eq. ( 1).The minimum contact rate b can be expressed as the final steady state of the infected persons' contact rate under anti-epidemic measures.Given that this model is based on China's strict anti-epidemic measures, we assume that the minimum contact rate is b = 0.However, it is important to consider the impact of changes in b on the epidemic under different anti-epidemic measures, as not all countries can implement measures as strict as China's.
Through simulation, we find that increasing the value of b leads to an increase in the number of cumulative infected cases during the second stage of the epidemic, as illustrated in Figure 6.When b reaches 0.6 and 0.8, there is a significant increase in the number of cumulative infected cases.However, when b equals 0.3, the increase is less noticeable.These findings suggest that reducing the flow of people can achieve a similar effect to a completely enclosed quarantine.Additionally, Figure 7 demonstrates that the time taken to reach the peak number of cumulative infected cases also increases with the rise of b.Therefore, by strengthening anti-epidemic measures and reducing population movement, it is possible to effectively lower the peak number of infections and bring the epidemic under control earlier.

Conclusion
This study proposes an improved SEIR epidemiological model to retrospectively analyze and predict the transmission trajectory of COVID-19 in Italy from February to September 2020.In this model, besides considering specific states for different population groups, dynamic parameter settings are incorporated by adding time varying deterministic and stochastic assumptions.This enables the model to better capture the realistic and complex dynamics of the epidemic spread.It is worth noting that some parameters of the improved SEIR model exhibit significant variations across different states.Therefore, regression algorithms and heuristic algorithms are used to calibrate these parameters, making the model applicable to data from any given state.By simulating and predicting the transmission trajectories during two periods, before and after the secondary outbreak in Italy, the numerical results demonstrate that the model possesses good long-term predictive performance and robustness.The model was then utilized to analyze the impact of anti-epidemic measures on the epidemic's development and predict future trends.The results indicate that: 1. Strict medical track quarantine is effective in restraining the epidemic's development.2. Reducing population movement, practicing personal protection, and avoiding contact with infected individuals can effectively reduce the peak number of infections, leading to earlier control of the epidemic.3. Lowering the proportion of population movement can achieve similar outcomes to complete lockdowns.
To curb the epidemic's growth, Italy must enhance anti-epidemic measures, reduce population movement, and promote public awareness of epidemic prevention measures to maintain the minimum contact rate of infected individuals within a small range.The simulation results of the epidemic situation in Italy without a second outbreak.The impact of changes in the minimum contact rate b on the number of cumulative infected cases.The impact of track quarantine on the spread of the epidemic.As this model was based on a closed space, it did not account for the inflow and outflow of populations between Italy and other countries.Thus, we plan to conduct further research by launching SEIR model simulation studies that take into consideration transportation, temporal and spatial distribution, population movement, and multiple patches.Through these analyses, we aim to develop an epidemic spread prediction and precise prevention control system based on temporal and spatial big data.This will provide more comprehensive insights into the dynamic nature of COVID-19 transmission, allowing authorities to formulate more effective measures to prevent and curb its spread.

FIGURE 1
FIGURE 1Population transformation diagram of the improved SEIR model.
the ratio of the number of infected (I) to infected (A), the results can be calculated:

FIGURE 2
FIGURE 2Fitting effect graph based on Italian daily recovery rate data.

FIGURE 4 (
FIGURE 4 (Continued) The simulation results of the epidemic situation in the case of a second outbreak in Italy.(A) Simulation and prediction of the second outbreak in Italy.(B) Absolute error of cumulative infected cases.(C) Absolute error of cumulative recovered cases.(D) Absolute error of cumulative dead cases.

TABLE 1 Improved
SEIR transmission dynamics differential equation parameter table.

TABLE 2
Confidence interval and values of parameters.