Estimating the Prevalence of Asymptomatic COVID-19 Cases and Their Contribution in Transmission - Using Henan Province, China, as an Example

Background: Novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), is now sweeping across the world. A substantial proportion of infections only lead to mild symptoms or are asymptomatic, but the proportion and infectivity of asymptomatic infections remains unknown. In this paper, we proposed a model to estimate the proportion and infectivity of asymptomatic cases, using COVID-19 in Henan Province, China, as an example. Methods: We extended the conventional susceptible-exposed-infectious-recovered model by including asymptomatic, unconfirmed symptomatic, and quarantined cases. Based on this model, we used daily reported COVID-19 cases from January 21 to February 26, 2020, in Henan Province to estimate the proportion and infectivity of asymptomatic cases, as well as the change of effective reproductive number, Rt. Results: The proportion of asymptomatic cases among COVID-19 infected individuals was 42% and the infectivity was 10% that of symptomatic ones. The basic reproductive number R0 = 2.73, and Rt dropped below 1 on January 31 under a series of measures. Conclusion: The spread of the COVID-19 epidemic was rapid in the early stage, with a large number of asymptomatic infected individuals having relatively low infectivity. However, it was quickly brought under control with national measures.


INTRODUCTION
In December 2019, cases of pneumonia with an unknown cause were reported. The disease was later named as novel coronavirus disease 2019 , caused by the severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) (1,2). The rapid increase in confirmed cases and subsequent secondary outbreaks in many countries caused concern on an international scale. As a result, the World Health Organization declared the COVID-19 outbreak a Public Health Emergency of International Concern on January 31, 2020 and eventually classified it as a pandemic on March 11, 2020 (3). As of July 19, 2020, 14 million COVID-19 cases and 597,583 deaths have been confirmed globally, including 85,937 confirmed cases in China (4). Although the number of confirmed cases was staggering, only the sicker part of those infected were being reported. Li et al. used a metapopulation model to estimate that 86% of the infections (presumably of mild symptoms or asymptomatic) before January 23, 2020 were undetected in Wuhan, China (5); Chinazzi et al. used a GLEAM model to estimate that only one out of four cases were confirmed in Mainland China by February 1, 2020 (6,7). Hao et al. used a SAPHIRE model to estimate that 87% of the infections before March 8, 2020 were unascertained in Wuhan, China (8). And some even suggested that most infections were caused by undetected cases (5,9). A significant proportion of these undetected infected individuals were asymptomatic (8). In one documented case, a patient who disclaimed all symptoms and showed a normal chest radiography had multiple PCR cycle counts consistent with that of symptomatic patients (10), suggesting such patients are somewhat infectious (11).
The proportion of asymptomatic cases is a critical epidemiological characteristic that modulates the pandemic potential of the emergent respiratory virus, and is an important parameter in estimating the disease burden (5,(12)(13)(14). Estimating the proportion of asymptomatic cases will improve the understanding of COVID-19 transmission and spectrum of presentation, thereby providing insight into the spread of epidemics (14). But the estimated proportion of asymptomatic infected individuals varied widely from place to place. A recent analysis of 21 retrieved reports by the Centre for Evidence-Based Medicine in Oxford found that estimates of asymptomatic COVID-19 cases ranged from 5 to 80% (15). Meanwhile, most studies only showed that asymptomatic infected individuals are less contagious than symptomatic ones (16,17). Only one previous study clearly showed that the asymptomatic cases could be one quarter as infectious as symptomatic cases in Ningbo, China (18). Therefore, it is important to estimate the proportion and infectivity of asymptomatic cases in various regions. Taking Henan Province as an example, we used a modelinference framework to explore the proportion and infectivity of asymptomatic cases, so as to estimate the prevalence of COVID-19.

Study Area
The study area is located in east-central China (31 • 23 ′ to 36 • 22 ′ north latitude, 110 • 21 ′ to 116 • 39 ′ east longitude, Figure 1), with a population of more than 96 million and an area of 167,000 km 2 . Most of Henan is located in the warm temperature zone and has Abbreviations: COVID-19, coronavirus disease 2019; R 0 , the reproductive number; R t , the effective reproductive number; SEIAUHR model, susceptibleexposed-asymptomatic-confirmed-unconfirmed symptomatic-hospitalizedremoved model. the characteristics of climate transition from plains to hills and mountains from east to west.

Source of Data
All data were obtained from the official websites of Provincial and Municipal Health Commissions (Supplementary Table 1), which published COVID-19 case data and information. The case data included the number of newly confirmed cases, cured cases, and deaths per day. The case information included age, gender, exposure history, date of symptom onset, and activity trajectory of confirmed cases. Identifiable personal information was removed for privacy protection.

Case Definition
Although the definition of COVID-19 cases has been changed several times, which has greatly affected the observed epidemic curve in Wuhan (19), the change of cases in Henan Province has been relatively stable, and the diagnosis of all cases in this study were based on the sixth edition of Diagnosis and Treatment Scheme for COVID-19 released by the National Health Commission of China (20). A laboratory-confirmed case was defined if the patient had a positive test of SARS-CoV-2 virus by real-time reversetranscription-polymerase-chain-reaction (RT-PCR) assay or high-throughput sequencing of nasal and pharyngeal swab specimens. Only laboratory-confirmed cases were included in this study.

Modeling the Epidemic of COVID-19 in Henan Province
To consider asymptomatic infected individuals, we constructed the susceptible-exposed-asymptomatic-confirmed-unconfirmed symptomatic-hospitalized-removed (SEAIUHR) model by extending the classic susceptible-exposed-infectious-removed (SEIR) model to include asymptomatic cases, unconfirmed symptomatic cases who did not seek medical attention or get tested for mild symptoms, and quarantined confirmed cases. In this model, we divided the population into seven compartments: S (susceptible), E (latent), A (asymptomatic infectious), I (confirmed symptomatic infectious), U (unconfirmed symptomatic individuals), H (hospitalized), and R (removed). Susceptible individuals could acquire the virus after contact with infected cases (both symptomatic and asymptomatic) and became latent when they were infected but non-infectious. After a period of time, some of the latent individuals developed into symptomatic infections; some of these were confirmed and treated until they progressed into the removed stage and some went unconfirmed because they did not present themselves to healthcare facilities or get tested for mild symptoms. Others developed into asymptomatic infections and remained infectious until they progressed into the removed stage. Removed stage included individuals who were recovered or had died (Figure 2). Dynamics of these seven parts over time could be expressed by the following ordinary differential equation: where β t was the transmission rate due to symptomatic infected individuals at time t, defined as the proportion of cases from susceptible individuals to infected individuals, both asymptomatic and symptomatic, caused by symptomatic infected cases; θ was the ratio of the transmission rate due to asymptomatic over symptomatic cases; µ 1 and µ 2 were the proportion of the asymptomatic and unconfirmed symptomatic cases among infected individuals, respectively; z was the latent period; r 1 , r 2 , and r 3 were infectious periods of confirmed symptomatic, asymptomatic, and unconfirmed symptomatic cases, respectively; and r was the duration from hospitalization to recovery or death. Assume that The differential equations in the model were numerically solved using a 4th order Runge-Kutta (RK4) method. Specifically, for each step of the algorithm, each term on the right side of the equation was determined using a random sample of the Poisson distribution (5).
On January 25, 2020, Henan Province implemented a firstlevel public health emergency response to the epidemic and took a series of prevention and control measures, such as traffic restriction, quarantine, contact tracing, isolated treatment of confirmed cases, and so on (21,22). We assumed that these major government measures caused the transmission rate to change from a constant rate to a time dependent exponentially decreasing rate (23).
Then, the formula of β t could be expressed by the following step function: where β 0 was the transmission rate due to symptomatic infected individuals before implementing measures; a was the decreasing rate of transmission rate; and t 1 was the date to start implementing measures. The effective reproductive number, R t , could be computed as: In the initial state, namely, t = 0, R t = R 0 is the basic reproductive number.

Estimation of Parameters in the Model
Initial states and parameter's setting in the model were presented in Table 1. We assumed that the initial latent population, asymptomatic infected population, and unconfirmed symptomatic cases were drawn from uniform distribution [0,10], the initial confirmed symptomatic infected population was 0, and the rest of Henan Province were susceptible. For parameters, we estimated β 0 , µ 1 , µ 2 , θ , and α by assuming that the values of parameters z, r 1 , r 2 , r 3 , and r were fixed throughout the process. We assumed that the initial values of each parameter to be estimated were drawn using Latin hypercube sampling in uniform distribution. The initial ranges of µ 1 , µ 2 , and θ were chosen to cover most possible values, i.e. [0,1]; the initial range of α was selected to more broadly cover what the previous research covered (23). The initial range of β was selected from the widest possible range of basic reproductive number (R 0 ). We used the Ensemble Adjustment Kalman Filter (EAKF) to infer epidemiological parameters of the model based on the number of cases presenting symptoms per day in Henan Province (31)(32)(33). The EAKF is a data assimilation algorithm that only needs hundreds of ensembles to obtain good results, especially suitable for the estimation of high-dimensional parameters of the model (34,35), and has been successfully applied to epidemics such as cholera and influenza (32,35). In this study, we used 1,000 ensembles and 1,000 independent realizations to infer parameters and their corresponding 95% confidence intervals (Cls).

Synthetic Testing
Before applying the model-inference framework to the number of daily incidence data, we tested the effect of model-inference framework with model-generated outbreak data. Specifically, we fixed the parameters of the model to specified values and used the model to generate synthetic outbreak data. We then applied the EAKF algorithm to the synthetic daily outbreak data and assessed the model-inference framework by analyzing whether the model could fit the synthetic outbreak data and estimate parameters.

Sensitivity of Parameters Estimation to the Range of Initial States and Values of Fixed Parameters
In initial states, the quantities of E 0 , A 0 , and U 0 were unknown, and our assumptions may affect the estimation of other parameters. Therefore, this study simultaneously investigated the results of parameter's estimation when shortening and expanding their ranges. At the same time, we changed values of fixed parameters, respectively, to test the robustness of our results. Figure 3, our model could fit reported daily incidence data well and accurately capture the peak and tendency of the epidemic. The numbers of reported daily cases were within the confidence interval estimated by the model, except for a few days in the later stages of the outbreak. The mean estimation of transmission rate due to symptomatic infected individuals was 1.14 (95% CI:1.07-1.23) at the beginning of the epidemic and the decreasing rate of transmission rate after implementing prevention and control measures was 0.16  (95% CI: 0.12-0.19). Our model estimated that the asymptomatic rate among COVID-19 infected individuals was 42% (95% CI: 41-47%), and the mean ratio of the transmission rate of asymptomatic over symptomatic cases was 0.1 (95% CI: 0.02-0.11). At the same time, our model estimated that 11% (95% CI: 9-22%) of infected individuals were unconfirmed symptomatic cases who did not seek medical attention or get tested for mild symptoms ( Table 2). Then, the fraction of undocumented infections in Henan Province was 53% (95% CI: 50-68%). Based on above parameters, we estimated the average effective reproduction number, R t , to be 2.73(95% CI: 2.64-3.31) at the beginning of the epidemic, which was equal to the basic reproduction number (R 0 ). With the implementation of measures, R t fell below 1 on January 31. The results of the synthetic test were shown in Figure 4 and Table 3. All generated values were within the confidence interval estimated by the model and values of all parameters were within the estimated 95% confidence interval, which demonstrated the ability of the model-inference-framework to fit the synthetic outbreak data and estimate all five target model parameters accurately.

As shown in
Results of parameter estimation when changing the range of initial states and values of fixed parameters were shown in Supplementary Table 2. It could be seen that values of the resampled epidemiological parameters fall near the values estimated from the original data, with small fluctuations, indicating that the estimated results of our model are robust.

DISCUSSION
Taking Henan Province as an example, we constructed a SEAIUHR model to estimate the prevalence of asymptomatic COVID-19 cases and their contribution in transmission with EAKF algorithm. This model-inference framework is also applicable to studies of asymptomatic infected individuals in other regions.
Asymptomatic proportion, which is broadly defined as the proportion of asymptomatic infections among all infections of the disease, is important for estimating the true burden of disease and its transmission potential. At present, results of different studies on the asymptomatic proportion vary greatly (15). We estimated that the proportion of asymptomatic infections among infected individuals during the entire epidemic was 42% in Henan Province, within the confidence interval of the estimated asymptomatic rate of 13 cases imported from Wuhan to Japan (14). But it was higher than that of the Diamond Princess cruise ship, which showed that only 17.9% of those infected were asymptomatic (36). It could be that passengers and crew on the Diamond Princess were not drawn from a random sample of the general population, most of whom were older than 60 years and tended to have more severe symptoms after infection. Our model estimated that the mean ratio of transmission rate due to asymptomatic over symptomatic cases was 0.1, corresponding to a study showing that prolonged exposure to infected persons and short exposure to symptomatic persons (such as coughing) is associated with a higher risk of transmission, while short exposure to asymptomatic contacts is associated with a lower risk of transmission (24). The less contagious nature of asymptomatic individuals may be the result  of a convolution of the shedding fraction of viable virus, the titer of viable virus in the primary/upstream case, and possibly behavioral factors. The fraction of undocumented infections, including asymptomatic cases and unconfirmed symptomatic cases who did not seek medical attention or get tested for mild symptoms, was lower than that of Wuhan in the early stage of the epidemic (5,6,8), which may be caused by following reasons. Firstly, in the early stage, the medical configuration was not perfect and public awareness was still insufficient, while the undocumented rate gradually decreased with the development of the epidemic (5, 10, 37). Secondly, contact tracing measures implemented in China may become unfeasible when the number of cases in Wuhan rose sharply in the early stage (3). Finally, we need to point out that the differences in the estimated proportions of asymptomatic cases and unconfirmed symptomatic cases may be due to unidentifiability of parameters in epidemiological models. The theoretical analysis of identifiability of parameters in epidemiological models needs to be done in the future.
Basic reproductive number (R 0 ) is an important parameter to determine whether an infectious disease is prevalent or not. If R 0 < 1, infectious disease would gradually decline and die out without an epidemic; if R 0 > 1, an epidemic would break out. In this study, our estimation of R t = 2.73 at the beginning of the epidemic measured the basic reproductive number R 0 , that is, without intervention, each infected individual could infect an average of 2.73 susceptible individuals. This result was similar to some studies in other regions of China (28,38,39), although it was smaller than results from some other research (38). Combined with the latent period, the number of cases without intervention would increase exponentially (25,29). However, Henan Province implemented a first level response on January 25, 2020, and adopted a series of prevention and control measures. The isolation treatment of confirmed cases and the testing of suspected cases aimed at removing infected individuals from the process of transmission. The closing of public places and the change of crowd behavior were to protect susceptible groups. Contact tracing, which identified possible chains of transmission between known infected persons and their close contacts, affected both susceptible and asymptomatic individuals and can effectively interrupt transmission. With the help of these measures, R t dropped below 1 on January 31, 2020.
This study also has some limitations. Firstly, our estimation of the asymptomatic proportion and infectivity was obtained by a model, which could not be generalized because it has not been confirmed by serological investigation. Secondly, we only used data from Henan Province, which might limit the interpretation of our results, although our model-inference framework is also applicable to studies of asymptomatic infected individuals in other regions. Therefore, large-scale relevant studies are needed in the future. Thirdly, this study estimated the average asymptomatic infection rate in Henan Province over time, but the asymptomatic rate may vary in different periods of the epidemic.

CONCLUSION
The epidemic situation developed rapidly in Henan Province, and there were a large number of asymptomatic infected individuals with relatively low infectivity. Our study further explored the prevalence of asymptomatic COVID-19 cases and their contribution to transmission so as to deepen people's understanding of asymptomatic cases and provide a reference for the prevention and control of COVID-19.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
CL and XL conceived of and designed the research. CL, YZ, CQ, LL, DZ, XW, KS, YJ, and TL did the analyses. CL wrote and revised the paper. DH, MX, and XL contributed to the writing and revisions. All the authors have read and approved the submitted version. All the authors have agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work are answered.