Viral Haplotypes in COVID-19 Patients Associated With Prolonged Viral Shedding

Background Recently, more patients who recovered from the novel coronavirus disease 2019 (COVID-19) may later test positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) again using reverse transcription-polymerase chain reaction (RT-PCR) testing. Even though it is still controversial about the possible explanation for clinical cases of long-term viral shedding, it remains unclear whether the persistent viral shedding means re-infection or recurrence. Methods Specimens were collected from three COVID-19-confirmed patients, and whole-genome sequencing was performed on these clinical specimens during their first hospital admission with a high viral load of SARS-CoV-2. Laboratory tests were examined and analyzed throughout the whole course of the disease. Phylogenetic analysis was carried out for SARS-CoV-2 haplotypes. Results We found haplotypes of SARS-CoV-2 co-infection in two COVID-19 patients (YW01 and YW03) with a long period of hospitalization. However, only one haplotype was observed in the other patient with chronic lymphocytic leukemia (YW02), which was verified as one kind of viral haplotype. Patients YW01 and YW02 were admitted to the hospital after being infected with COVID-19 as members of a family cluster, but they had different haplotype characteristics in the early stage of infection; YW01 and YW03 were from different infection sources; however, similar haplotypes were found together. Conclusion These findings show that haplotype diversity of SARS-CoV-2 may result in viral adaptation for persistent shedding in multiple recurrences of COVID-19 patients, who met the discharge requirement. However, the correlation between haplotype diversity of SARS-CoV-2 virus and immune status is not absolute. It showed important implications for the clinical management strategies for COVID-19 patients with long-term hospitalization or cases of recurrence.


INTRODUCTION
The COVID-19 pandemic caused by the SARS-CoV-2 virus, was first reported from Wuhan's Huanan Seafood Market in China in December 2019 (Li et al., 2020a;Zhu et al., 2020). Since the early cases were from China, further evidences pointed the outbreak of the SARS-CoV-2 virus to a zoonotic origin of the epidemic (Zhou et al., 2020). Some studies also found that an early COVID-19 pandemic had spread among asymptomatic individuals in other countries several months before the first case was identified in China (Apolone et al., 2020;Zehender et al., 2020). However, previous studies have revealed that other viruses could contribute to the crossreactivity of SARS-CoV-2 antibodies (Ng et al., 2020). It is well known that SARS-CoV-2 is an infectious respiratory virus, and is highly contagious with a low-mortality rate , so tracing early COVID-19 cases associated with Wuhan could contribute to revealing the molecular characterization of SARS-CoV-2.
Compared with severely affected COVID-19 patients, it has been reported that mild asymptomatic infections and moderate infections with lung imaging manifestations are the main clinical types in China's new COVID-19 Pneumonia Diagnosis and Treatment Guidelines (6th Edition) (Ma et al., 2020). Previous reports showed that the above two infectious groups had better recovery with fewer hospitalization days . However, based on the criterion of 2 consecutive negative PCR test results, a high recovery rate of 16.8%  or 21.4%  has been found from COVID-19 patients who had positive PCR test results. In our previous study, a recovery rate of 31% has been reported (Li et al., 2020b). Studies have shown that the detection methodology and intermittent shedding of SARS-CoV-2 will have an impact on the clinical laboratory detection of nucleic acid (Li et al., 2020c). Five cases of re-infection have been reported Tillett et al., 2021;Bongiovanni, 2021). However, there are still some controversies against the persistent viral shedding of re-infection or recurrence, and those conclusion are key to rule out the disease control strategies and clinical treatment.
In this study, we conducted next-generation sequencing on respiratory samples from three patients who were confirmed as clinical moderate COVID-19 disease, and had long hospitalization stays in Yiwu, China. They were distinctly characterized by the two familial connections of the disease. Through the analysis of viral genome sequencing and clinical follow-up of the three patients, we found that the cause of prolonged viral shedding may be due to the existence of viral quasi-species during their first and second hospitalizations. These findings suggest that we should pay attention to the identification of SARS-CoV-2 haplotypes, which is advantageous to formulate more accurate discharge criteria, reduce "recovery" cases by confirming whether viral haplotypes existed, and avoid the possibility of further human-tohuman transmission.

Patients and Samples
The three patients (named YW01, YW02, and YW03) in this study had been hospitalized with an acute respiratory illness (pneumonia) in the Fourth Affiliated Hospital Zhejiang University School of Medicine in Zhejiang Province of China, a hospital designated for the diagnosis and treatment of COVID-19 patients in February 2020, showing lung involvement with ground-glass opacity. They had an onset of symptoms in January 2020 and had been diagnosed as moderate cases of COVID-19 according to the severity of clinical symptoms. However, they were readmitted in March 2020, after being discharged in late January 2020 based on the COVID-19 Pneumonia Diagnosis and Treatment Guidelines in China. Respiratory tract specimens, such as nasopharyngeal swabs and sputum, and peripheral blood in our hospital and these biological samples were confirmed as being positive for SARS-CoV-2 nucleic acids. This study was approved by the Ethics Committee of the Fourth Affiliated Hospital, College of Medicine, Zhejiang University (Approval No. K20200026). Written informed consent was obtained from all patients when samples were collected. Patients were informed about the surveillance before providing written consents, and data directly related to disease control were collected and anonymized for analysis.

Laboratory Tests and Computed Tomography Scanning
Follow-up observation and monitoring indicators were performed during hospitalization. Severe acute respiratory syndrome coronavirus (SARS-CoV-2) ribonucleic acid (RNA) was tested using real-time reverse transcription-polymerase chain reaction (RT-PCR) kits (Liferiver, Shanghai, China). Immunoglobulin G (IgG) and Immunoglobulin M (IgM) tests were performed against SARS-CoV-2 nucleoproteins using colloidal gold immunodot assay (Innovita, Beijing, China). Whole blood analyses were performed with an automated hematology analyzer (Sysmex, Kobe, Japan) and cytokines were detected by a flow cytometry assay (Agilent, California, USA). Latex-enhanced immunoturbidimetry was used for Creactive protein examinations (Mindray, Shenzhen, China). CT scanning was performed routinely as a follow-up after discharge.
Whole Genome Sequencing of SARS-CoV-2 SARS-CoV-2 RNA was extracted from 200 μl of respiratory tract specimens from three COVID-19 patients with a high viral load at early stage of first hospitalization, using QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. Based on the crossing point (Cp) value of < 25 and 25-32 to assess the viral load, a different nucleic acid pretreatment was performed using probe capture technology to remove the anthropogenic nucleic acid. Qualified libraries were measured and checked by the Invitrogen Qubit 2.0 fluorometer (ThermoFisher, Foster City, CA, USA) and Agilent 2100 Bioanalyzer (Agilent, California, USA). Then, virus genomes were sequenced with PE150 using the NovaSeq 6000 platform (Illumina, San Diego, CA, USA) before bioinformatics analysis including data filtering and genome assembly. Genome sequences of Patients YW01, YW02, and YW03 described in this manuscript (Accession ID: EPI_ISL_3501737, EPI_ISL_3501738, EPI_ISL_3501739, EPI_ISL_3501740, EPI_ISL_3501741), are available from GISAID (https://www.gisaid.org/).

Haplotype Reconstruction
Haplotype reconstructions were performed by a maximumlikelihood framework aBayesQR v 1.0.0 (Ahn and Vikalo, 2018), with the SNV_thres set to 0.05, and only haplotypes with frequency ≥ 0.10 were included in the following analysis.

GISAID Genome Sequences and Annotation
A total of 350 SARS-CoV-2 genome sequences, which belong to the GISAID S clade from January to February 2020, were downloaded from GISAID (https://www.gisaid.org/) (Elbe and Buckland-Merrett, 2017). By removing sequences with gaps (Ns) and ambiguous characters, 296 genome sequences from GISAID were finally included in our analysis. All these genome sequences were annotated using CDSs annotated in NC_045512 by Exonerate (-model protein2genome: bestfit -score 5 -g y) (Slater and Birney, 2005;Abascal et al., 2010).

Linkage Disequilibrium Analysis
The linkage disequilibrium of the specific sites was analyzed and visualized by Haploview v4.2 (Barrett et al., 2005).

Evolutionary Analysis
The evolutionary analysis was performed in the Datamonkey Adaptive Evolution Server (https://www.datamonkey.org/) (Kosakovsky Pond et al., 2020) with the adaptive Branch-Site Random Effects Likelihood (aBSREL) method (Smith et al., 2015) and also by CodeML with a branch model (Yang, 2007).
is a raw distance score between proteins A and B, and Smax and Smin is the maximum and minimum value of the score correspondingly. DRMS is the root mean square deviation (angstrom) of distances between Cbeta atom positions of aligned residues

Statistical Analysis
Statistical analysis was performed using SPSS Statistics version 23 (IBM). A P-value of ≤ 0.05 was considered significant.

Case Report
The three cases were second-generation COVID-19 infections from Wuhan. YW01 and YW02 belong to the same familial cluster because both were at the same family party together on January 12, 2020. Before YW01 was exposed to the infection, YW02 had been exposed to a common Wuhan-related firstgeneration of COVID-19 patient, who was the son of YW02, on January 17, 2020. However, Patient YW03 was associated with another familial cluster infection. All of them had a long followup of 76, 69, and 74 days, respectively.
Patient YW01 was a 32-year-old healthy woman who was diagnosed with COVID-19 with a history of familial clusters in our hospital . Her chest X-ray was normal upon admission. Because she had been breastfeeding, she was treated with atomized inhalation of recombinant human interferon a-2b (IFN a-2b) as the antiviral treatment, supplemented with traditional Chinese medicine including Scutellaria, Reed Root, Patchouli, et al. On Day 4, a chest CT scan revealed a dense patchy consolidation and ground-glass opacity in the right lower lobe ( Figure 1A). Nasopharyngeal swab specimens were positive for SARS-CoV-2 RNA repeatedly, although there was no significant increase in CRP. Subsequently, chest CT scans showed the gradual clearing of the lungs. On Day 28, the patient was discharged as three consecutive nasopharyngeal swabs tested negative for SARS-CoV-2 RNA, and the CT scan showed obvious absorption ( Figure 1B).
However, YW01 came to our hospital for her follow-up 8 days after being discharged on Day 37. Her chest CT findings showed that her two lungs had new ground-glass opacity ( Figures 1C, D), while the original lesion in the right lower lobe was basically absorbed and the SARS-CoV-2 RNA test was positive again. After ten days of the second hospitalization, YW01 was discharged again and the course of the disease persisted for 48 days until CT results showed progressive absorption of the lesion ( Figures 1E, F).
A 72-year-old woman with a history of hypertension, named YW02, was diagnosed with COVID-19 and chronic lymphocytic leukemia (CLL) , whose chest CT showed multiple areas of ground-glass opacities under the pleura of both lungs and tested positive in a SARS CoV-2 RNA test ( Figures 1G, H). She received a modified combination therapy consisting of antiviral treatment, Thymosin, as well as intravenous immunoglobulin (IVIG), supplemented with Chinese herbal medicine as adjuvant therapy similar to Patient YW01, and she was discharged when CT results showed absorption ( Figures 1I, J). During the long period of followup, her chest CT showed a transient progression of the lesion followed by gradual absorption (Figures 1K, L). The patient was readmitted to the hospital for isolation, because she tested positive in the SARS CoV-2 RNA test on Day 35 and Day 64 of the overall course; after treatment, the virus test showed negative and the patient was discharged on Day 52 and on Day 69, respectively. However, the last CT scan on Day 47 showed an improvement ( Figures 1M, N).
Patient YW03 was a previously healthy 69-year-old man with a clear epidemiological history and positive result for the SARS-CoV-2 RNA test. His chest X-ray was normal upon admission. Patient YW03 received antiviral therapy (Lopinavir/ritonavir, Arbidol, aerosol inhalation of Interferon a-2b), Thymosin, and Chinese herbs similarly. Five days later, a chest CT scan showed a patchy ground-glass opacity in the upper and lower lobes of the right lung ( Figures 1O, P). After treatment, a chest CT showed obvious absorption of inflammation in both lungs on Day 16, although a respiratory specimen repeated positive in a nucleic acid test (Figures 1Q, R). On Day 29, he was discharged since respiratory specimens were consecutively negative in a nucleic acid test. Thirteen days after his first discharge, the patient was readmitted because of consecutively positive results for SARS-CoV-2 during follow-up on Day 43 and was provided similar treatment as he had received previously. IVIG and Hydroxychloroquine were used as a try. His respiratory specimens were continuously positive in a nucleic acid test, though a chest CT showed that ground-glass opacity was basically absorbed (Figures 1S, T). Patient YW03 was discharged again 18 days later. In about 2 weeks after Day 73, a follow-up chest CT scan showed no sign of recurrence (Figures 1U, V) and the SARS-CoV-2 RNA test was negative.

Epidemiological and Clinical Characteristics of 3 Viral Recurrence Patients
The clinical diagnosis timeline of the three patients with confirmed COVID-19 is shown in Figure 2A. The serum specimens for anti-SARS-CoV-2 antibodies (IgM and IgG) tests were collected during the first admission and isolation hotel period, while sputum or nasal swabs for SARS-CoV-2 RNA testing were sampled throughout the whole course of the disease. The continuous SARS-CoV-2 RNA positive results showed the prolonged viral shedding, while anti-SARS-CoV-2 IgG antibodies were detected in the serum of the patients in February.
Previous studies had demonstrated the need for monitoring kinetic changes of inflammatory cytokine levels in COVID-19 patients . Therefore, we continuously monitored the inflammatory levels during repeat admissions, such as interleukin (IL) -2, IL-4, IL-6, IL-10, IFN-g, and TNF-a, in the serum of the three long hospitalization patient cohort ( Figure S1). All patients had moderate symptoms of COVID-19 with a long viral shedding period in our study. It was considered that there was little significant difference in the serum levels of these cytokines in mild patients in previous studies, however, our study showed the contrary. Except for IL-6, examined cytokines in Patients YW01 and YW03 showed higher levels during the first admission. While Patients YW01 and YW03 had high values of IL-2, IL-4, IL-6, and IFN-g in the second admission. However, the IL-6 levels in all of the three patients exceeded the upper limit of normal value (6.61 pg/ml) in the second admission. The only significant difference in serum TNF-a level was observed in Patients YW01 and YW03 between the first and second admissions. Moreover, we observed higher IL-2, IL-4, and IFN-g levels in Patient YW01 than Patients YW02 and YW03 in the second admission; however, the differences were not significant in the first admission.

Variants Landscape and Haplotypes Reconstruction
The sequencing reads coverage was shown in Figure 3C, the average sequencing depth of YW01, YW02 and YW03 were 809.7, 5343.7, and 441.3, respectively. By aligning to the NCBI reference strain Wuhan-Hu-1 (Accession NC_045512.2), our strains showed more than 99.9% identity with the reference genome. A total of 37 variants (allele frequency, AF>5%) were detected in our YW SARS-CoV-2 strains, including 22 missense, 14 synonymous and 1 stop gained variants ( Table 1). Cooccurrence of substitutions C8782T and T28144C suggested all our strains belonged to the GISAID S clade. The number of variants per sample showed no correlation with the sequencing depth (R= -0.96, p = 0.18).
Considering low level intra-host diversity in SARS-CoV-2 infection, we chose aBayesQR to get whole length haplotypes, which is designed for highly identical haplotype reconstruction. By using a maximum-likelihood framework aBayesQR, we reconstructed viral haplotypes for our samples. With a frequency cutoff at 0.1, two haplotypes were reconstructed in both YW01 and YW03 samples, respectively, and one haplotype was found in YW02. Both YW01 Q1 and YW03 Q1 harbored variant T21835C, whereas haplotype YW01 Q2 and YW03 Q2 harbored variants A12696G, C17678T, C18131T, C19718T, C24034T, T26729C, and G28077C ( Figures 3A, B), which indicates YW01 Q1 and YW03 Q1 came from one ancestral strain/quasi-species and YW01 Q2 and YW03 Q2 came from another.
We further examined the distribution of these 8 specific variants in all the 187,857 SARS-CoV-2 genome sequences deposited in the GISAID Database until Dec. 11 th , 2020. The allele frequencies of these variants were shown in Table 1. Three variants, including C24034T, T26729C, and G28077C, were found highly linked with D' larger than 0.85 (Figure 4). In addition, the linkage was also observed between C19718T and C24034T with D'=0.97.

Phylogenetic and Haplotype Network Analysis
As our samples are classified as GISAID S clade, we collected all genome sequences belonging to the S clade from GISAID Database with date from January 2020 to February 2020 to investigate the potential epidemiological linkage between samples. By removing in-complete genome sequences and lowquality genome sequences, a total of 302 SARS-CoV-2 sequences including 5 haplotypes from our YW samples, NC_045512.2 and 296 S clade genome sequences were subjected to haplotype network and phylogenetic analysis. As shown in Figure 5A, a median-joining network showed that YW01 Q1, YW03 Q1 and YW02 grouped with the major node from China with 3-6 nucleotide substitutions, whereas YW01 Q2 and YW03 Q2 clustered with South Korea, Vietnam, USA, and some samples from China. In addition, a maximal likelihood phylogenetic analysis based on all coding regions also supported all the 5 YW haplotypes being separated into 2 branches ( Figure 5B). YW01 Q2 and YW03 Q2 were clustered with 16 SARS-CoV-2 strains which harbored the three-lined variants C24034T, T26729C, and G28077C with a high bootstrap support (Clade A), whereas YW01

Evolutionary Analysis
A few non-synonymous substitutions were observed in ORF1ab of YW01 Q2 and YW03 Q2. We set this branch as the foreground branch and use HyPhy and CodeML to find whether this branch is evolving at faster rates in the ORF1ab coding region than the expectation from the rest strains. Unfortunately, no strong episodic diversifying selection was observed on this branch with w>1 and p value >0.05 by the aBSREL method of HyPhy and CodeML branch models.

Structural Modeling of the Protein With Missense Variants
The ternary structure of the protein harboring the 5 missense variants was modeled using the SWISS-MODEL server and compared to the wild type by MATRS (MArkovian TRAnsition of Structure evolution server) subsequently. The structural similarities were quantified by using R dis and dRMS scores (Kawabata and Nishikawa, 2000). As shown in Figure S2 (1-5), all the five missense variants have little effects on the ternary structures of corresponding proteins. The R dis scores range from 0.06 to 0.11A, and the dRMS scores range from 97.9 to 100%.

DISCUSSION
Currently, one of the topical issues for COVID-19 is the recurrence and positive test result of patients after being discharged from hospitals, although they already had two consecutive negative RNA results before being discharged . Previous studies supported the explanation of falsenegative results of SARS-CoV-2 PCRs (Xie et al., 2020), intermittent shedding of SARS-CoV-2 (Li et al., 2020c) and reinfection (Bongiovanni, 2020;Larson et al., 2020;. This study presents a novel perspective of prolonged viral  shedding of COVID-19. Respiratory samples from the three patients confirmed as COVID-19 were collected at the early stage of first hospitalization, and the whole-genome sequencing was performed on the above clinical samples at their first admission into the hospital. We analyzed the results of nucleic acid tests, inflammatory chemokines, serum SARS-CoV-2 IgG and IgM, chest CT imaging, and other laboratory tests during the whole course of hospital stays. We found that haplotype or quasispecies co-infection of SARS-CoV-2 could occur at an early stage of infection, with a prolonged viral shedding of COVID-19. Another COVID-19 confirmed patient with chronic lymphoblastic leukemia had a long-term viral shedding period of 69 days, who developed multiple relapses after initial discharge, however, only one haplotype virus was found. As a case reported, immunocompromised individual could show persistent SARS-CoV-2 infection and a long viral shedding (Avanzato et al., 2020). Therefore, a weak humoral immune response in YW02 patient may contribute to a long viral shedding. Above mentioned evidence showed the correlation between haplotype or quasispecies diversity of SARS-CoV-2 virus and recurrence in COVID-19 patients, which could result in co-infection and ineffective clearance of SARS-CoV-2 viruses, not only humoral immune response or other risk factors.
As we all know, virus-host interaction during the infectious stage determines the clinical outcome of viral infection, and this process is complex and dynamic (Teng et al., 2016). The existence of a broad host provides convenience for the cross-species transmission of the coronavirus. It is worth noting that the highly infectious human pathogens of SARS-CoV, MERS-CoV, and SARS-CoV-2, which caused the worldwide pandemic, are bcoronaviruses, and current evidence suggests that their natural hosts are bats (Azhar et al., 2014;Menachery et al., 2015;Zhou et al., 2020). Specifically, the epidemic started in December 2019, and Wuhan was the first city with the COVID-19 outbreak, with the early cases being patients in the country. Because of the emergency and limited medical resources at the early outbreak of COVID-19 in Wuhan, however, cases associated with it also exhibit the original infectious features in the evolutionary of the SARS-CoV-2 virus. Therefore, studies on next-generation COVID-19 cases associated with Wuhan have implications . In this study, we report the epidemiological, clinical laboratory test, radiological, host transcriptional RNAs, and viral genome characteristics findings of three moderately affected COVID-19 patients. They did not travel to Wuhan and were local residents in Yiwu, Zhejiang Province, China, respectively from two family clusters, in which additional family members had returned from Wuhan. Our findings are consistent with person-to-person transmission of SARS-CoV-2 in family settings, and the subsequent genome analysis findings between virus-host interaction revealed the explanations on the high presence of SARS-CoV-2 RNA positive recurring after discharge from hospital (Li et al., 2020d).
All of COVID-19 patients were considered as moderate cases in our hospital, which is consistent with a study showing that 81% of of COVID-19 cases in China are mild (Wu and McGoogan, 2020c), Multiple studies showed the general susceptibility of the population and have indicated that risk factors for mortality rate include age, being a male, and pre-existing diseases including hypertension, smoking, coronary heart disease, cardiovascular disease, diabetes, lung diseases and neurological symptoms, among others (Abdel-Mannan et al., 2020;St John and Rathore, 2020). Therefore, a host's immunity has been considered as a reason for the re-occurrence of positive COVID-19 results. The case report of Patient YW02 in our study showed that with CLL the virus was not effectively cleared so that she was re-admitted twice during the 69-day follow-up . There were some risk factors existing in Patient YW03, such as advanced age and being male, while Patient YW01 was a young female in healthy condition. However, YW01 and YW03 showed a similar co-infection mode with SARS-CoV-2 strains from clades A and B.
Phylogenetic and haplotype analyses of genetic sequences from these patients were performed, and different quasi-species exist simultaneously in the SARS-CoV-2 virus. In agreement with previous studies (Han et al., 2015), haplotype or quasispecies also exist in the SARS-CoV-2 virus. The evidence in our study showed that co-infections happened before the stage of infection exposure. However, the dominant strain underwent individual screening ( Figure 2B). The subsequent clinical phenotype could be a single dominant strain retained or coexistent with multiple strains of the SARS-CoV-2 virus. Therefore, we speculate that not only immune deficiency, but also SARS-CoV-2 haplotypes in COVID-19 patients could be associated with prolonged viral shedding, even the recurring diagnosis of SARS-CoV-2 RNA test.
Our study had several limitations yet. Firstly, the speculative conclusion of our study should be validated in larger multiple clinical center cohorts, and whole-genome analysis during the second episode was needed for the effect of the SARS-CoV-2 virus quasi-species. Secondly, since viral characteristics in COVID-19 patients were not available during the early stages of the epidemic, seroconversion of IgM could not be observed in the earliest infected patients until late March. Hence, the samples we collected showed no regularity. Thirdly, prolonged viral shedding means that reinfection or recurrence is always controversial and may be influenced by host immune response to SARS-CoV-2 in COVID-19 patients and low-sensitivity of RT-PCR kits in the early stages of February 2020. However, our study provides a detailed correlation analysis of clinical presentation and viral quasi-species at an early stage of exposure infection.
In conclusion, we found the correlation between the haplotypes of the SARS-CoV-2 virus and prolonged viral shedding in Wuhan-associated cases in early 2020 and pointed out the importance of accurate monitoring in SARS-CoV-2 virus molecular typing, which is of great significance for the clinical management strategy of COVID-19 patients with long hospitalization period or in cases of recurrence.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are publicly available. The data presented in the study are deposited in the GISAID (https://www.gisaid.org/) repository, accession number EPI_ISL_3501737, EPI_ISL_3501738, EPI_ISL_3501739, EPI_ISL_3501740, EPI_ISL_3501741.

ETHICS STATEMENT
This study was approved by the Ethics Committee of the Fourth Affiliated Hospital, College of Medicine, Zhejiang University (Approval No.K20200026). The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
WY and WX collected data. XL and SZ contributed to statistical analyses. TL and HT analyzed CT images. YH, KX, and YW edited figures and tables. YW edited the manuscript. XX and KX reviewed the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
The authors thank the researchers who generated and shared the sequencing data from GISAID (https://www.gisaid.org/) on which this research is based. We thank all individuals of the Fourth Affiliated Hospital, College of Medicine, Zhejiang University. who participated in this study. We thank KS Xu, JG Wu and X Hu for their suggestive comments to this study.