Toward the Impact of Non-pharmaceutical Interventions and Vaccination on the COVID-19 Pandemic With Time-Dependent SEIR Model

The outbreak of COVID-19, caused by the SARS-CoV-2 coronavirus, has been declared a pandemic by the World Health Organization (WHO) in March, 2020 and rapidly spread to over 210 countries and territories around the world. By December 24, there are over 77M cumulative confirmed cases with more than 1.72M deaths worldwide. To mathematically describe the dynamic of the COVID-19 pandemic, we propose a time-dependent SEIR model considering the incubation period. Furthermore, we take immunity, reinfection, and vaccination into account and propose the SEVIS model. Unlike the classic SIR based models with constant parameters, our dynamic models not only predicts the number of cases, but also monitors the trajectories of changing parameters, such as transmission rate, recovery rate, and the basic reproduction number. Tracking these parameters, we observe the significant decrease in the transmission rate in the U.S. after the authority announced a series of orders aiming to prevent the spread of the virus, such as closing non-essential businesses and lockdown restrictions. Months later, as restrictions being gradually lifted, we notice a new surge of infection emerges as the transmission rates show increasing trends in some states. Using our epidemiology models, people can track, timely monitor, and predict the COVID-19 pandemic with precision. To illustrate and validate our model, we use the national level data (the U.S.) and the state level data (New York and North Dakota), and the resulting relative prediction errors for the infected group and recovered group are mostly lower than 0.5%. We also simulate the long-term development of the pandemic based on our proposed models to explore when the crisis will end under certain conditions.

The outbreak of COVID-19, caused by the SARS-CoV-2 coronavirus, has been declared a pandemic by the World Health Organization (WHO) in March, 2020 and rapidly spread to over 210 countries and territories around the world. By December 24, there are over 77M cumulative confirmed cases with more than 1.72M deaths worldwide. To mathematically describe the dynamic of the COVID-19 pandemic, we propose a time-dependent SEIR model considering the incubation period. Furthermore, we take immunity, reinfection, and vaccination into account and propose the SEVIS model. Unlike the classic SIR based models with constant parameters, our dynamic models not only predicts the number of cases, but also monitors the trajectories of changing parameters, such as transmission rate, recovery rate, and the basic reproduction number. Tracking these parameters, we observe the significant decrease in the transmission rate in the U.S. after the authority announced a series of orders aiming to prevent the spread of the virus, such as closing non-essential businesses and lockdown restrictions. Months later, as restrictions being gradually lifted, we notice a new surge of infection emerges as the transmission rates show increasing trends in some states. Using our epidemiology models, people can track, timely monitor, and predict the COVID-19 pandemic with precision. To illustrate and validate our model, we use the national level data (the U.S.) and the state level data (New York and North Dakota), and the resulting relative prediction errors for the infected group and recovered group are mostly lower than 0.5%. We also simulate the long-term development of the pandemic based on our proposed models to explore when the crisis will end under certain conditions.

INTRODUCTION
On March 11, 2020, the World Health Organization (WHO) declared that the outbreak of the novel coronavirus  can be characterized as a pandemic. The COVID-19 outbreak started in Wuhan, China in December, 2019. By the end of January, 2020, the confirmed cases in China went up to 11, 791. Only 1 month later, the number increased nearly seven-fold to 80, 134 and the COVID-19 cases gradually showed up in other countries. Starting from March, 2020, the outbreak spread to more than 100 countries. By the end of 2020, the pandemic has led to 77.5M confirmed cases and more than 1.72M fatalities worldwide. Figure 1 summarizes the percentage of global confirmed cases contributed by each country. As of December 24, the United States, India, and Brazil are the three countries most impacted by the COVID-19 pandemic. The trajectories of the confirmed cases in the three countries are also displayed.
The COVID-19 virus has caused a great disruption to the human health, social life, developments, and economics. To stop the spread of COVID-19 virus, governments have carried out numerous preventive measures such as stayat-home orders, travel restrictions, school closure, maskwearing mandate, and so forth. The impact on the society came later in all aspects, including rising unemployment, protests against restrictions, and psychological anxiety and stress brought to the public. However, a significant decrease in the transmission rate occurred, which proved that these mitigation measures were effective. Months later, many states in the U.S. have loosened their restrictions and lifted orders to allow businesses to reopen to the public. Consequently, the diagnoses of daily confirmed cases have displayed a consequential increasing trend after the reopen in some states such as Alabama. By looking at the numbers only, it is difficult to assess what stage we are at in the COVID-19 pandemic and when it is going to end. Hence, mathematical models considering the epidemiological characteristics of COVID-19 become crucial and significant to track and forecast the trend of the spread. The classic epidemiology model exhibits compelling results, especially during the early period of the pandemic. The compartmental models, which are the simplified versions of mathematical models for infectious diseases, divide the population into different compartments between which people may progress. Different diseases are represented by different compartmental models (Schmidt, 1981;Sharomi and Gumel, 2011;Gao et al., 2016). The Susceptible-Infectious-Recovered (SIR) model, as one of the simplest and most classic compartmental models, characterizes the dynamic changes in each compartment using ordinary differential equations. There are three compartments in this model: susceptible (S), infectious (I), and recovered/deceased (R). The number of individuals in each compartment varies over time. The deterministic SIR and its derivatives are widely used to predict infectious deceases like COVID-19 Katul et al., 2020;Toda, 2020). Besides compartmental models, statistical learning techniques are also widely used in biomedical fields (Zheng et al., 2018Hsieh and Zheng, 2019;Ganyani et al., 2020;Murray, 2020;You et al., 2020). For example, IHME team (Murray, 2020) employed a statistical model to predict the number of deaths, the demand of hospital beds, ICU beds and ventilators in a few months.
In this paper, we develop a time-dependent Susceptible-Exposed-Infectious-Recovered (SEIR) model with coefficients estimated by Least Absolute Shrinkage and Selection Operator (LASSO) regression. This model is inspired by the SIR model and takes the existence of incubation period (the time from exposure to development of symptoms) into consideration. The individuals who have been infected but are not yet infectious are labeled as exposed (E). Instead of the constant parameters used in traditional SIR based models, we propose to model the dynamic with time-dependent parameters. Additionally, we extend our SEIR model to accommodate other crucial factors such as immunity, reinfection, and vaccination cases into account. With the epidemiology models, we aim at answering the following questions: • What is the trajectory of transmission rate, incubation rate, and recovery rate? • Has the inflection point been reached. If so, when?
• How does the reopen order affect the spread of the pandemic? • How do reinfection and vaccination affect the pandemic? • When will the mortality reach the peak?
• How many cases do we expect to have when the pandemic is over?
The remainder of the paper is organized as follows: we build the time-dependent SEIR model in section 2. Then we extend the model to include the vaccinated group as well as analyze the asymptotic stability of its disease-free equilibrium in section 3. To validate our model, we perform numerical analysis, prediction, and model simulation using national level data of the United States, and the state level data of two selected states, New York and North Dakota. The results are presented in section 4. Lastly, we conclude this paper in section 5.

THE TIME-DEPENDENT SEIR MODEL
Our proposed SEIR model with time-dependent parameters describes the transmission dynamic of an epidemic. It is assumed that there are totally four states in which an individual would experience: susceptible, exposed, infected, and recovered. In the susceptible state, the individual does not have the disease but can be infected by someone infectious through an effective contact. Once being infected, the individual moves to the exposed state. The exposed individual is not able to infect others until the incubation period is over. Eventually, the infected individual recovers from the disease. Altogether the four groups of individuals at different states compose the entire population and we denote the number of individuals in each group at time t by S(t), E(t), I(t), and R(t). In this model, a person is assumed to be immune to the virus after recovery and will not return to the susceptible state. Accordingly, the number of deaths caused by the disease is also counted in the recovered group R(t) since neither of the recovered and dead has any more impact on the spread of the virus. The differential equations that govern the trajectories of the four compartments are formulated as: with a constant total population N, and therefore, we have: Three time-dependent parameters, the transmission rate β t , the transition rate σ t , and the recovery rate γ t are introduced in this model, which are all assumed to vary with respect to time. The descriptions and empirical ranges are listed in Table 1.
The proportion of susceptible and infected individuals in the population at time t are S(t) N and I(t) N , respectively. Given the transmission rate β t , which describes the flow of susceptible becoming exposed to the virus, and the total population N, the number of newly exposed people is β t S(t)I(t) N . Later, the exposed individuals make the transition to the infected state at the transition rate σ t , which is the inverse of the incubation period. The number of exposed individuals who complete the transition at time t is σ t E(t). Similarly, people recovered at time t is γ t I(t), given the recovery rate γ t , which is the number of individuals recover from the infected state per person per time.

Discrete Time-Dependent SEIR Model
Since the COVID-19 case report is updated daily, we revise the differential Equations (1)-(4) into discrete time difference equations as follows: with the four variables satisfying (5) and Assuming historical data for a certain time period 0 ≤ t ≤ T is available, i.e., we have {S(t), E(t), I(t), R(t)|0 ≤ t ≤ T}. By deduction from (7) to (10), we can compute historical values of the parameter series {β t , σ t , γ t |0 ≤ t ≤ T − 1} using the following formulas: Now predicting future values of the parameters {β t , σ t , γ t |t ≥ T} given historical values can be converted to a regression problem. There are several approaches predicting future values of the timedependent parameters. For instance, we can use linear models (e.g., linear regression), nonlinear methods (e.g., spline), or time series models (e.g., autoregressive model), etc. In this subsection, we fit the following LASSO regression models: where I, J, and K are the orders of the autoregressive process, and These coefficients are determined by minimizing the following loss functions, which are composed of the residual sums of squares (RSS) and regularization terms: λ β , λ σ , and λ γ are the regularization parameters deciding the penalty to the flexibility of model, and all regularization parameters can be optimized by cross-validation.

Estimating the ExposedÊ(t), Infectionŝ I(t), and RecoveredR(t) Groups
Given the historical data {S(t), E(t), I(t), R(t), 0 ≤ t ≤ T}, we first compute the time-dependent parameter series {β t , σ t , γ t , 0 ≤ t ≤ T − 1} introduced in section 2.1. Then we predict future values {β t ,σ t ,γ t , t ≥ T} using the model built in section 2.2. According to (8), (9), (10), and (5), we can further predict the number of cases for the future as follows: Frontiers in Artificial Intelligence | www.frontiersin.org σ t Transition rate from exposed to infections at a given time Note that for the special case when estimating (21), (22), (23), and (24). The detailed steps of the entire procedure are summarized in Algorithm 1.

SEIR VARIATION CONSIDERING IMMUNITY, REINFECTION, AND VACCINATION
The human immune system protects the body against diseases with two parts. The first part, known as the innate immune response, includes the release of chemicals that cause inflammation and white blood cells that can destroy infected cells. It is always ready to take actions as soon as any foreign invader is detected inside the body. However, this part is not specific to coronavirus. It will not learn and develop immunity to the virus. Instead, the second part: the adaptive immune response produces targeted antibodies that can stick to the virus and stop the spread to the body. The T cells 1 would attack the cells infected by the virus. Existing research shows that most COVID-19 patients had an antibody response at 10 days or later after onset of symptoms . If the adaptive immune response is powerful enough, it could leave a lasting memory of the infection that will provide protection in the future. Other findings also suggest that strong responders (with higher antibody level) are significantly higher in severe patients, while it is unclear whether the asymptomatic or mildly symptomatic patients will develop sufficient adaptive immune response and gain immunity to the disease after recovery (Tan et al., 2020). In fact, there have been several reported cases of COVID reinfection in China, Hong kong, Belgium, the Netherlands, and the U.S. (Tan et al., 2020), and the reinfection case are indeed increasing. This implies the necessity of taking reinfection into consideration.
On the other hand, the worldwide endeavor to create a safe and effective COVID-19 vaccine is beginning to bear fruit. A wide variety of vaccines has already been authorized around the globe while many more remain in development. According to the U.S. CDC, as of December 13, 2020, the Pfizer-BioNTech COVID-19 vaccine has been authorized and large-scale (Phase 3) clinical trials are in progress or being planned for three other vaccines in the United States. Currently the supply of COVID-19 vaccine in the U.S. is limited, but it will increase in the upcoming weeks and months. Once large quantities are available, the increasingly large-scale vaccination will have a substantial impact on the pandemic.

The Time-Dependent SEVIS Model
To take the factors of immunity, reinfection, and vaccination into account, we modify the proposed SEIR model by removing the recovered group R(t) and adding a vaccinated group V(t), which represents the vaccinated individuals. In this susceptible, exposed, vaccinated, and infected modeling framework, the previous assumption for the SEIR model that an infected individual will not become susceptible again after recovery is no longer employed. Instead, we assume that a fraction of the infected individuals gain immunity after recovery through producing antibodies while the rest return to the susceptible state. The former is counted in the V(t) group along with the vaccinated individuals since, epidemiologically speaking, both are immune to the virus and can no longer be infected. The new SEVIS model is governed by the following differential equations: with a constant total population N, and therefore, we have: The parameter settings of the transmission rate β t , the transition rate σ t , and the recovery rate γ t remain the same as in the SEIR model. The vaccination rate v t is low at the beginning of vaccine administration and gradually increasing as supply is growing. w ∈ [0, 1] is the fraction of infected cases that become immune after recovery. In addition, we assume it to be constant in this model. Hence, the number of infected individuals recover at time t is γ t I(t), and wγ t I(t) join the V(t) group while (1 − w)γ t I(t) fail to gain immunity and return to the susceptible state S(t).

Baseline Epidemiological Parameters
In previous studies, the transmission rate, β (as a constant), ranges from around 0.5 to 1.5 per person per day (Ngonghala et al., 2020;Read et al., 2020;Shen et al., 2020) and decreases as time goes. Based on existing literature, the incubation period (the time from exposure to development of symptoms) of COVID-19 and other coronaviruses ranges from 2 to 14 days. On average, symptoms show up in the newly exposed person about 5.1 days after contact (Fairoza Amira et al., 2020;Ngonghala et al., 2020). Thus, the transition rate, which is the inverse of the incubation period, is estimated to be 1 5.1 .

Basic Reproduction Number and Asymptotic Stability of Disease-Free Equilibrium
In this subsection we give the closed-form expression for the time-dependent basic reproduction number of the SEVIS model using the next generation operator method (Diekmann et al., 1990;van den Driessche and Watmough, 2002). The basic reproduction number R 0 is defined as the average number of secondary infections caused by a single infectious individual who enters an entirely susceptible population. That actually is the special case where all parameters and compartments are at their initial state at time t = 0. Since we propose the parameters to be time-dependent in our model, we revise the basic reproduction number to a time-dependent version R t as well. When R t > 1, the infection will be able to start spreading in the population and develop into an epidemic. Generally speaking, it is more difficult to control the epidemic with the larger the value of the basic reproduction number. Let X be the vector of infected classes and Y be the vector of uninfected classes. For the SEVIS model (25)-(28), we have: Next we define the matrix of new infection terms F , which only includes the flow from X to Y, and matrix of all other terms V, which includes flows within X and flows leaving the system. For each compartment, in-flow in V is negative and out-flow in V is positive.
The next generation matrix is defined as FV −1 where: The disease-free equilibrium (DFE) of the SEVIS model is given by: (S * , E * , V * , I * ) = (N, 0, 0, 0), and we have Therefore, the next generation matrix is: R t , the basic reproduction number at time t, is given by the dominant eigenvalue of FV −1 : Similarly, we can obtain the same basic reproduction number for the time-dependent SEIR model. The DFE is locally asymptotically stable if R t < 1, and unstable if R t > 1.

NUMERICAL RESULTS, PREDICTIONS, AND SIMULATIONS
In this section, we will give the numeric results obtained by implementing Algorithm 1 on the national level data of the  United States (US) as well as the state level data of a few representative states. In spring 2020, the New York Metropolitan Area experienced the largest COVID-19 outbreaks. As thousands of cases were being confirmed daily in New York, the state was the epicenter of the nation's crisis and on a different scale than the rest of the country. Though some new batches of hotspots have emerged across the country during the past months, the state of New York (NY) is still a region worth studying. On the other hand, as of December 24, a pack of northern states close to the Canada-US border have the highest percentages of cumulative confirmed cases in their populations as shown in Figure 2. The top one, North Dakota, has 11.94% of its population infected cumulatively, followed by South Dakota (10.69%), Wisconsin (8.61%), and some other nearby states. In this case, as a representative of this particular area, we take North Dakota (ND) as another example to illustrate our algorithm. We used the dataset that was collected from the COVID-19 data repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (Dong et al., 2020) and the nCov2019 R package . The dataset contains time series of the numbers of confirmed cases, recovered cases and deaths up to December 24, 2020. The starting date of the training set used for model training varies according to the actual spread of the pandemic in each of the three regions: US, NY, and ND. For each region, a different start date of training set is chosen for model fitting according to the time when a relatively clear trend emerges. Figures 3, 4 presents the cumulative numbers of COVID-19 confirmed cases, recoveries and deaths reported in US, NY, and ND. The data starts at the beginning of the pandemic for US and ND, but it starts a while after the initial point for NY. The reason is that, back when the pandemic first started, a series of well-recorded numbers of recoveries were not available for many states, including NY. To obtain complete data on the three type of cases for computation, a cut-off is made. Therefore, the starting point of the data we collected for NY is about 2 months later than the actual date when the first case of COVID-19 was confirmed.
Due to the unavailability of the numbers of the exposed individuals E(t) in any of these regions, we substitute our model in section 2.1 with a simplified version as in Chen et al. (2020) that only includes the other three compartments S(t), I(t), and R(t).
To validate our algorithm, we compare the prediction results with known data to see how well it performs, or how large the prediction errors are. Then we implement the algorithm again to predict how the COVID-19 pandemic will spread in the future.
At the end of this section, we simulate the long-term development of the pandemic based on the epidemiology models proposed in sections 2, 3 by constructing certain conditions and assigning assumed values to the parameters listed in Table 1. Based on the results, we discuss what they indicate as well as what differences we expect to see in reality compared to the simulation.

Parameter Tracking and Prediction
First we compute the true values of the transmission rate β t and the recovery rate γ t using (12), (13), and (14). Then starting from the sixth day in the parameter series, we take the value of a time-dependent parameter for each day as a subject for testing and a 5-day window before it as a corresponding observation used for training, i.e., I, K = 5 in section 2.2. By doing this, we construct the training and testing sets for model fitting. The R package glmnet is used to fit the LASSO regression models and choose the optimal values of λ β and λ γ that yield the minimum mean crossvalidated errors. Figures 5, 6 depict the true values {β t , γ t |0 ≤ t ≤ T − 1} and predicted values {β t 1 ,γ t 2 |I + 1 ≤ t 1 ≤ T − 1, K + 1 ≤ t 2 ≤ T − 1} of both the transmission rate and the recovery rate of US, NY, and ND, respectively. The 95% prediction intervals are shown as the gray bands above and below the curves.
For the U.S. case, there was a sharp decrease in the transmission rate from mid-March to May, just about 1 week after the spread of the virus started. This was an evidence that the social distancing measures and community lockdowns implemented across the country have effectively and significantly slowed down the spread of the pandemic. It kept decreasing for about a month before a surge appeared in July, which is possibly caused by the nationwide celebration of Independence Day. In the fall, starting from early September, the transmission rate slowly rose again with increasingly larger oscillations, which showed consistency with the surge in the fall that pushed the total number of confirmed cases in US past 11M. This could be a result of a series of events prior to that (e.g., school opening, Halloween), and a prelude to the upcoming large gathering (e.g., Thanksgiving, Black Friday, Christmas). We expect this increase in the transmission rate to continue toward early 2021 and start to gradually decrease after the vaccination is administrated at a large scale in U.S. The recovery rate also had an slight increase around the same time in July but not as large as the one in the transmission rate. Overall, the recovery rate of U.S. is relatively steady and does not show any significant increasing or decreasing trend.
Similar to the US case, the transmission rate of NY started high and then reduced rapidly in the next few weeks. The trend maintained stationary for about 3 months until a rise appeared in late September and kept increasing toward the end. By December, the transmission rate is nearly as high as when it first started. The recovery rate of NY also had a large initial value followed by a 2month-long decrease, but no clear trend was shown after a small spike at the beginning of July.
As for the ND case, the recovery rate started with a mild increase in the first 2 month. Later on, it remained steady just like the previous two regions. For the transmission rate, the overall trend is much more stationary compared to the results of US and NY and no significant change could be observed. However, the true values of the two parameters of ND have the greatest oscillations, i.e., the largest ranges of oscillations, among the three regions. Note the two unusually acute spikes in the transmission rate respectively in May and December and one in the recovery rate in December that deviate from the entire curves. In the absence of any pre or post trend, we consider these points as outliers in this paper and exclude them in model training.

Algorithm Validation and Relative Percentage Errors
In this section, we use the computed values of the parameters to estimate the three variables S(t), I(t), and R(t) as in section 2.3. Instead of directly predicting future values for t > T, we use the historical data {I(t), R(t)|T − t w ≤ t ≤ T − 1} and the predicted parameter series {β t ,γ t |T − t w ≤ t ≤ T − 1} to estimate the last t w days of the entire period of time by which the data is covered, i.e., predict {Î(t),R(t)|T − t w + 1 ≤ t ≤ T}. Moreover, we also compare the proposed model with the classic SIR model with constant parameters by replacing the time-dependent parameter series with their means. We evaluate the model performance using the relative percentage errors (RPE) of the prediction for the infected group I(t) and the recovered group R(t) as follows: To assess the predictions of the proposed method and compare with the classic SIR model, we compute the RPE series for the past week (i.e., t w = 7) for the two models. The RPE series for US, NY, and ND are displayed in Figures 7, 8 respectively, with their means summarized in the top-left corner of each figure. Using the proposed model with time-dependent parameters, the mean relative percentage errors for I(t) and R(t), i.e., RPE I and RPE R , are 2.35 and 0.39% for US, 0.2 and 0.2% for NY, and 4.67 and 0.09% for ND, respectively. Using the classic SIR model with  constant parameters, RPE I and RPE R are 10.18 and 0.62% for US, 3.64 and 0.53% for NY, and 15.84 and 0.3% for ND, respectively. All errors are significantly larger than the former, which clearly shows the proposed time-dependent model yields better results in predicting the spread of the pandemic than the traditional SIR model with fixed parameters. Details of the model training and validation process are summarized in Table 2.

One-Day Prediction for I(t), R(t), and Basic Reproduction Numbers
Next we implement Algorithm 1 to predict the number of infected I(t) and recovered individuals R(t) for the future {Î(t),R(t)|T + 1 ≤ t ≤ T + t w }. We reset the prediction window t w to be 30, as we are to predict the spread of COVID-19 pandemic in the next 30 days after December 24, 2020. The results of 1-day prediction for US, NY, and ND are shown in Figures 9, 10, respectively. For NY, the sharp increase in the infected group since November is predicted to continue toward the next year, due to the oscillatory rise in the transmission rate shown in Figure 6. On the other hand, the growth of the recovered group remains slow. For ND, the number of infected will stay low after the small surge was contained in November, while the rapid growth in the recovered group is expected to be continuous but might slow down. For US, the prediction shows that both curves will keep climbing at a high rate, which indicates that there will still be a long way to go before the pandemic finally ends. The prediction results are summarized in Table 3.
To assess the spread of COVID-19, we also obtain the 1-day prediction for the time-dependent basic reproduction number R t using (31). The results for the three regions  are presented in Figures 11, 12, with horizontal lines representing R t = 1. As discussed in section 3.3, the virus will decline and gradually die out when R t < 1.
Otherwise, it will continue to spread. According to the results shown in Figures 11, 12, only very few points fall below the horizontal line, while the majority lies above it. For NY, the surge in fall, 2020 and some scattered large values agree with the increasing trends in both the confirmed cases and the transmission rate we see in Figures 4,  6, respectively.
The basic reproduction numbers R t for each of the next 30 days are estimated to be >1 for all three regions. The means of predicted values are found to be 2.48 for US, 22.28 for NY and 1.68 for ND, which suggests the inflection point, where R t stabilizes below 1 afterwards, has not been reached yet, especially for the NY case, where instead of having a decreasing trend, an increasing R t actually emerges over time. For US and ND, the curves gradually approaching the horizontal lines of R t < 1 indicates that the measures taken to tackle the pandemic are taking effect, but at this point it is sill too early to relax them.

Simulation Results for the SEIR and SEVIS Models
We also simulate the long-term development of the COVID-19 pandemic based on SEIR and SEVIS models. March 17, 2020, the first day in our US data, is chosen as the starting date of the pandemic in the simulations. For the SEIR model, we set the transition rate to σ t = 1 5.1 according to Table 1. To simulate as close to the reality as possible, we set the transmission rate β t and the recovery rate γ t to the means of their true value series obtained in section 4.1. To construct the initial conditions of the system, we use the initial values I(0) = 311 and R(0) = 27 obtained from the data as well. In previous studies, the average Infected-Suspected ratio in China, one of the earliest hot spots of the global COVID-19 outbreak, was found to be 2.399 (e.g., Fairoza Amira et al., 2020). In this simulation, due to the lack of data FIGURE 13 | Simulation based on the SEIR Model for the U.S. of the exposed group, we use the same ratio to initialize E(t), i.e., E(0) = 1 2.399 I(0) ≈ 130. According to the U.S. and World Population Clock (United States Census Bureau, 2020), the U.S. population is N = 329, 227, 746. Using (5), we have: With the aforementioned parameter settings and initial conditions, we simulate the COVID-19 pandemic for the US. As shown in Figure 13, the number of infected people reaches a peak in early July, 2020, and the pandemic gradually dies out in summer 2021. It is important to note that the simulation is only theoretical and restricted by given conditions. These conditions can be dramatically different in realty. Moreover, no mitigation measure of any kind that can possibly prevent or limit the spread of the virus is considered in the simulation, such as wearing facial coverings, social distancing, community lockdowns, and work-from-home policies. Being free of the influences of such factors indicates that the pandemic might develop slower in the simulation than in reality. Since many states of the U.S. are following the strict guidelines set by CDC, the pandemic is highly likely to end earlier than the simulation result.
Next, we take immunity, reinfection and vaccination into account, and simulate the pandemic according to the SEVIS model proposed in section 3.1. The parameter settings of β t , σ t , and γ t remain the same as in the SEIR simulation. For the vaccination rate v t , we clarify a starting date of vaccination t v . Before the vaccination starts, i.e., for t < t v , v t = 0. When t ≥ t v , v t becomes positive and based on the discussion in section 3, we assume v t to start at a low value in realty and exponentially increase as time goes on. Here, we simplify this process by assuming the mean of {v t |t ≥ t v } to be 1% per day and assigning it to v t , and let the vaccination start on January 1, 2021. As for the last parameter w in Table 1, the fraction of infected cases that become immune after recovery is currently unknown. In this simulation, we assume w to be 0.5. Figure 14 shows the simulation result with the vertical dashed line representing t = t v , (i.e., the first day of 2021). We notice that the trajectories obtained from the SEVIS model before the vaccination are nearly identical to the previous SEIR simulation. Once vaccination begins, the growth of the immunity group V(t) and the decrease of the infected group I(t) clearly accelerate. However, different from SEIR model which assumes no reinfection, the SEVIS model does allow reinfection, which leads to a longer time for the virus to die out. To speed up the process, we can employ a larger value for w, i.e., increased flows from I(t) to V(t) and reduced flows from I(t) to S(t).

CONCLUSION
Considering the incubation period of COVID-19, we first proposed a time-dependent SEIR model with the time-dependent parameters estimated by LASSO regression. The proposed model is validated using the national level data (the United States) and state level data (New York and North Dakota). Overall, our proposed model outperforms the SIR model with smaller prediction errors. Furthermore, by taking immunity, reinfection, and vaccination into account, we proposed a time-dependent SEVIS model without assuming guaranteed immunity after recovery as in the SEIR model. Simulations are performed using the proposed two models to predict the spread of COVID-19 pandemic for the United States.
With the daily recorded data in the U.S., our algorithm predicts that the numbers of the infected and recovered individuals will keep increasing at a high rate in the short future. The total number of confirmed cases in the U.S. is estimated to reach close to 25.7M by late January, 2021, while North Dakota and New York will face 1.26 and 0.96M total confirmed cases, respectively. Given the historical transmission and recovery rate of the COVID-19, the simulation of SEVIS model predicts that