Epidemiological and Genomic Analysis of SARS-CoV-2 in Ten Patients from a Mid-sized City outside of Hubei, China

A novel coronavirus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing COVID-19 pandemic. In this study, we performed a comprehensive epidemiological and genomic analysis of SARS-CoV-2 genomes from ten patients in Shaoxing, a mid-sized city outside of the epicenter Hubei province, China, during the early stage of the outbreak (late January to early February, 2020). We obtained viral genomes with > 99% coverage and a mean depth of 296X demonstrating that viral genomic analysis is feasible via metagenomics sequencing directly on nasopharyngeal samples with SARS-CoV-2 Real-time PCR Ct values less than 28. We found that a cluster of 4 patients with travel history to Hubei shared the exact same virus with patients from Wuhan, Taiwan, Belgium and Australia, highlighting how quickly this virus spread to the globe. The virus from another cluster of husband and wife without travel history but with a sick contact of a confirmed case from another city outside of Hubei accumulated significantly more mutations (9 SNPs vs average 4 SNPs), suggesting a complex and dynamic nature of this outbreak. We also found 70% patients in this study had the S genotype, consistent with an early study showing a higher prevalence of S genotype out of Hubei than that inside Hubei. We calculated an average mutation rate of 1.37x10-3 nucleotide substitution per site per year, which is similar to that of other coronaviruses. Our findings add to the growing knowledge of the epidemiological and genomic characteristics of SARS-CoV-2 that are important for guiding outbreak containment and vaccine development. The moderate mutation rate of this virus also lends hope that development of an effective, long-lasting vaccine may be possible.

al. 2020 proposed that the L type may be more aggressive in replication rates and spreads more position 28144 was determined to be L type, and T at position 8782 and C at position 28144 was There are two apparent clusters in these ten patients. The first cluster involves four 1 2 6 patients that are relatives who traveled together to the Hubei province in the late January. The 1 2 7 first patient in this cluster had symptom onset on their last day in Hubei province while the other 1 2 8 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https: //doi.org/10.1101//doi.org/10. /2020 three patients had symptom onset 4-5 days after their trip ( Table 1). The second cluster involves 1 2 9 two patients who are family members that live together and did not travel to the Hubei province. with an average of 17.1 million reads. A small percentage of these reads mapped to SARS-CoV-1 3 9 2 RNA ( Table 2). The range of sequence reads that mapped to SARS-CoV-2 RNA was 2,413 1 4 0 reads to 163,158 reads with an average of 49,066 reads. We observed a clear negative correlation are only semi-quantitative and cannot be interpreted directly as viral loads.  author/funder, who has granted medRxiv a license to display the preprint in perpetuity. With a large variation in the SARS-CoV-2 RNA mapped reads, we were still able to 1 4 9 obtain excellent coverage and depth when each genome was mapped to the first SARS-CoV-2 1 5 0 genome, Wuhan-Hu-1 (6) (Fig 1A). The coverage for all genomes was above 99% and the mean 1 5 1 depth for the genomes ranged from 12X to 1024X ( Table 2, Fig 1B). Genomes sequenced to a 1 5 2 relatively low mean depth (12X to 47X) were still able to be genotyped successfully (see Results 1 5 3 below) but our results suggest that SARS-CoV-2 read counts of at least 15,000 yield sufficiently 1 5 4 high depth to characterize even low prevalence or rare mutations. each genome while the Y-axis plots the cumulative percentage of bases covered to the specified 1 6 0 depth. To determine the single nucleotide polymorphisms (SNPs) of SARS-CoV-2 in these 10 1 6 4 patients, we mapped each genome to the original Wuhan-Hu-1 reference which was collected on 1 6 5 All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10. 1101/2020 December 31, 2019 (6). The genomes contained a fairly moderate number of SNPs (mean of 4 1 6 6 SNPs, range 1-9) ( Table 3), consistent with previous reports of relatively low mutation rates 1 6 7 (11). The genomes with the largest number of SNPs came from individuals who had contact with 1 6 8 a confirmed case from Ningbo, Zhejiang and no travel history to the Hubei province ( Shaoxing-9 and 10).  them. Shaoxing-9 was infected first and then transmitted to Shaoxing-10, whose virus gained a 1 8 9 non-synonymous mutation ( Table 4).

9 0
All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.16.20058560 doi: medRxiv preprint 1 4 We combined epidemiologic data with the SNP analysis to estimate the mutation rate of 2 4 1 the SARS-CoV-2 from these ten patients. We saw an average mutation rate of 1.37x10 -3 2 4 2 nucleotide substitution per site per year for SARS-CoV-2, which is similar to SARS-CoV-1 with 2 4 3 a reported mutation rate of 0.80-2.38 x10 -3 nucleotide substitution per site per year (12). These 2 4 4 data demonstrate that SARS-CoV-2 is similar in the mutation rate as other coronaviruses.

4 5
Our data support the hypothesis put forward by Tang et al. 2020, which states that human SARS-CoV-2 (10). The less aggressive form of the SARS-CoV-2 (S type) was allowed to 2 4 8 increase in prevalence due to relatively weaker selective pressure. Although a small sample size, 2 4 9 70% (7/10) of our patients were infected with the S type and the majority of which (71%, 5/7) 2 5 0 traveled to or from Wuhan within 14 days of symptom onset. The dynamics of S and L genotype 2 5 1 distribution may have a role in assessment of the severity of the outbreak as it is still rampaging 2 5 2 the world as we write this manuscript. Our study adds the growing body of evidence that the 2 5 3 mutation rate of SARS-CoV-2 is not any different from other coronavirus, which is important for 2 5 4 vaccine development (11, 13).

5 5
The major limitation of this study is that we only had 10 samples analyzed due to the 2 5 6 requirement of sufficient SARS-CoV-2 RNA from a metagenomic sample. However, with the 2 5 7 development of a SARS-CoV-2 probe enrichment kit, this type of deep sequencing analysis may 2 5 8 be applied to samples with lower viral loads, thereby enabling more complete molecular 2 5 9 epidemiological surveillance. In addition, the C t value cut-off of 28 established in this study may 2 6 0 not be directly applicable to other real-time PCR assays due to the technical differences. In summary, we showed that a full viral genomic analysis is feasible via metagenomics 2 6 2 sequencing on nasopharyngeal samples with SARS-CoV-2 Real-time PCR C t values less than 28.

6 3
All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi. org/10.1101org/10. /2020   All rights reserved. No reuse allowed without permission.
author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10. 1101/2020  Correlation. The X-axis plots the log value of the SARS-CoV-2 RNA reads while the Y-axis 3 1 1 plots the C t values for the N gene for the ten Shaoxing patients. author/funder, who has granted medRxiv a license to display the preprint in perpetuity.