Whole-genome sequencing of SARS-CoV-2 isolates from symptomatic and asymptomatic individuals in Tanzania

Background Coronavirus Disease-2019 (COVID-19), caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) accounts for considerable morbidity and mortality globally. Paucity of SARS-CoV-2 genetic data from Tanzania challenges in-country tracking of the pandemic. We sequenced SARS-CoV-2 isolated in the country to determine circulating strains, mutations and phylogenies and finally enrich international genetic databases especially with sequences from Africa. Methods This cross-sectional study utilized nasopharyngeal swabs of symptomatic and asymptomatic adults with positive polymerase chain reaction tests for COVID-19 from January to May 2021. Viral genomic libraries were prepared using ARTIC nCoV-2019 sequencing protocol version three. Whole-genome sequencing (WGS) was performed using Oxford Nanopore Technologies MinION device. In silico genomic data analysis was done on ARTIC pipeline version 1.2.1 using ARTIC nCoV-2019 bioinformatics protocol version 1.1.0. Results Twenty-nine (42%) out of 69 samples qualified for sequencing based on gel electrophoretic band intensity of multiplex PCR amplicons. Out of 29 isolates, 26 were variants of concern [Beta (n = 22); and Delta (n = 4)]. Other variants included Eta (n = 2) and B.1.530 (n = 1). We found combination of mutations (S: D80A, S: D215G, S: K417N, ORF3a: Q57H, E: P71L) in all Beta variants and absent in other lineages. The B.1.530 lineage carried mutations with very low cumulative global prevalence, these were nsp13:M233I, nsp14:S434G, ORF3a:A99S, S: T22I and S: N164H. The B.1.530 lineage clustered phylogenetically with isolates first reported in south-east Kenya, suggesting regional evolution of SARS-CoV-2. Conclusion We provide evidence of existence of Beta, Delta, Eta variants and a locally evolving lineage (B.1.530) from samples collected in early 2021 in Tanzania. This work provides a model for ongoing WGS surveillance that will be required to inform on emerging and circulating SARS-CoV-2 diversity in Tanzania and East Africa.


Introduction
Coronavirus disease 2019 (COVID- 19) is an emerging viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which belongs to Coronaviridae family (1). An outbreak of the disease was reported for the first time in December 2019 in Wuhan city, Hubei province, China. The disease rapidly spread to other regions with sky-rocketing numbers of new infections. On 11 March 2020, the World Health Organization (WHO) declared COVID-19 a pandemic (2). By 25 November 2022, a total of 636,440,663 confirmed cases of COVID-19 were reported globally to the WHO and 6,606,624 (∼1%) deaths cumulatively whereas Africa contributed 9, 390,554 cases and 174,993 (∼2%) cumulative deaths (3). Tanzania described the existence of the first case of COVID-19 on 16 March 2020 (4). The country encountered several waves of COVID-19 pandemic and as of 25 November 2022, confirmed cases of COVID-19 were 40,471 with 845 (∼2%) cumulative deaths (3). The country joined the COVID-19 Vaccine Global Access (COVAX) initiative program in the mid of 2021 with limited knowledge of the molecular epidemiology of COVID-19 (5). COVID-19 vaccine is an effective means of containing the COVID-19 pandemic, however, antigenic drift particularly at the antigenic epitope and receptor binding sites may interfere with the compatibility of vaccine and field strains (6).
SARS-CoV-2 is an enveloped positive sense, single-stranded RNA virus with a genomic RNA size of approximately 30 Kb (7). The viral genome has two major parts: the open reading frame 1a and b (ORF1a and ORF1b) toward the 5 end of the genome, and the structural protein encoding region toward the 3 end. ORF1 encodes 16 non-structural proteins that are responsible for the formation of the replicationtranscription complex including an RNA dependent RNA polymerase (RdRp). The structural protein encoding region translates into the Spike (S), Envelope (E), Membrane (M) and Nucleocapsid (N) proteins of the virion (7). The S protein allows viral entry into the host cell by binding to the angiotensinconverting enzyme 2 receptors and is a common vaccine target (8). The genome of SARS-CoV-2 is prone to mutations that allow emergence of different variants including the Variants of Concern (VOC). The strains assigned as VOC have high transmissibility and/or pathogenicity. Importantly, VOC can be associated with reduced vaccine and therapeutic effectiveness, as well as impaired detection by the currently approved diagnostics (9).
Whole-genome sequencing (WGS) of SARS-CoV-2 can address certain polymerase chain reaction (PCR) based diagnostic limitations, and additionally describe the specific SARS-CoV-2 variant, or when applied across larger populations of isolates, also inform transmission or evolutionary dynamics. Capacity for WGS and complementary bioinformatics is not widely available in Tanzania, a problem shared by some other countries similarly burdened by the COVID-19 pandemic and a lack of early coordinated scientific response. For example, since Tanzania announced the first case of COVID-19 (4), and while routine testing of asymptomatic travelers was eventually administered, routine availability of testing symptomatic individuals was never fully scaled to meet the population needs (10). Apart from 44 travel cases and 9 apparent travel cases, no SARS-COV-2 sequences from Tanzania had been deposited in public repositories prior to this publication. Consequently, this study aimed to perform WGS of SARS-CoV-2 from a convenient but representative sample set in Tanzania to describe circulating variants, mutations and phylogenies, while developing a model for future response.

Study design and population
This retrospective cross-sectional study was done from January to May 2021 to characterize the whole-genomes of SARS-CoV-2 isolates from archived nasopharyngeal swabs. The Frontiers in Medicine 02 frontiersin.org samples were collected from symptomatic and asymptomatic individuals with PCR positive test for COVID-19 from three diverse regions in Tanzania. Symptomatic case was regarded as an individual who had tested positive for SARS-CoV-2 using nucleic acid amplification test with at least one of the following signs and/or symptoms; -fever, cough, tiredness, loss of taste or smell, sore throat, headache, aches and pains, diarrhea, a rash on skin, or discoloration of fingers or toes, red or irritated eyes, difficulty breathing or shortness of breath, loss of speech or mobility or confusion, chest pain. Whereas, asymptomatic cases were individuals who tested positive for SARS-CoV-2 using a nucleic acid amplification test but had no symptoms that were consistent with COVID-19 and were largely from travelers (11).

Settings, sample collection, and management
Nasopharyngeal swabs were obtained from the biorepository of the National Public Health Laboratory, Dar es Salaam, Tanzania and shipped to Kibong'oto Infectious Diseases Hospital (KIDH) Laboratory, northern Tanzania, for SARS-CoV-2 sequencing. Briefly, within the biorepository, we sorted all the archived nasopharyngeal swabs collected from regions near neighboring countries as well as administrative regions with international airports (Mwanza, Dodoma, Kilimanjaro and Dar es Salaam) (Figure 1). Thereafter, we randomly selected 69 COVID-19 PCR positive (<30 cycle threshold values) nasopharyngeal swabs. These specimens were previously collected from symptomatic and asymptomatic travelers and from patients seeking medical care in health facilities in Tanzania. The nasopharyngeal swabs were kept in virus transport medium, and were shipped at 2-8 • C to KIDH laboratory. Upon arrival at KIDH laboratory, the samples were immediately stored at −80 • C until viral RNA extraction, genomic library preparation and sequencing. SARS-CoV-2 PCR was not repeated at KIDH laboratory.

Laboratory procedures 2.3.1. SARS-CoV-2 RNA extraction
Severe acute respiratory syndrome coronavirus-2 RNA was extracted from 100 µL of nasopharyngeal swabs using Quick-RNA Viral kit (Zymo Research Corp., CA, United States) as per manufacturer's protocol. Briefly, the viruses in the swabs were inactivated by mixing equal volumes of specimen and that of DNA/RNA shield (2X concentrate). Viral RNA purification was conducted using Zymo-Spin TM IC columns, whereby the RNA was allowed to bind into the columns and washed by viral wash buffer and absolute ethanol (99.9%). The final eluted volume of viral RNA from each sample was 15 µ L.

Genomic library preparation
Genomic library preparation was carried out using the ARTIC nCoV-2019 sequencing protocol v3 (LoCost) (12). Briefly, 2 µL of LunaScript RT SuperMix (5X) (New England BioLabs, MA, United States) was used to generate complementary DNA (cDNA) from 8 µL of the extracted viral RNA. No-RT control mix (5X) and a negative control (nuclease-free water) were included in the cDNA synthesis step to monitor cross-contamination between samples in the downstream PCR amplifications. Using Q5 Hot Start High-Fidelity 2X Master Mix (New England BioLabs, MA, United States), reaction 1 and 2 of multiplex PCR was set to amplify 2.5 µL of cDNA in combination with 10 µM ARTIC nCoV-2019 V3 primer pool 1 and 2, respectively. Primer pool V3 with added primer pair 74_alt0 (Inqaba Biotec, Pretoria, South Africa) was later applied to reduce primer dropout (failure of the ARTIC v3 primers to hybridize the targeted SARS-CoV-2 genome site resulting into incomplete genome sequence coverage) in multiplex PCR. The programs for multiplex PCR were: Heat activation at 98 • C, 30 s; Denaturation at 95 • C, 15 s, 35 cycles; Annealing/Extension at 63 • C, 5 min, 35 cycles; and Hold at 4 • C. Gel (1.5%) electrophoresis was done at 100V for 20 min to assess the quality of amplicons from each of the two multiplex PCR reactions. Equal volumes of the amplified cDNA from the two reactions were combined together followed by 1:10 dilution using nuclease-free water. From this, 3.3 µL of the diluted PCR amplicons was endprepared using NEBNext Ultra II End Repair/dA-tailing module (New England BioLabs, MA, United States) by incubation at 20 • C for 15 min and 65 • C for 15 min. From the end-preparation step, 0.75 uL of the end-prepared reaction mixture was barcoded using Native Barcoding Expansion Kit 1-12 and 13-24 (ONT, Oxford, United Kingdom) at 20 • C for 20 min and 65 • C for 10 min incubation. The barcoded cDNA were pooled together into 1.5 mL eppendorf tube. Size selection of the pooled-barcoded cDNA with 400 bp was done using 0.4× Ampure XP beads (Beckman Coulter, IN, United States). Resuspension of the beads was done twice using 250 µL of short fragment buffer (ONT, Oxford, United Kingdom) followed by a single bathing of the pellets using 70% ethanol. Elution of the pooledbarcoded cDNA was done using 30 µL of nuclease-free water. Adapter Mix II (ONT, Oxford, United Kingdom) were ligated on 30 µL of pooled-barcoded cDNA by Quick T4 DNA ligase (New England BioLabs, MA, United States). The ligated libraries were washed twice using 250 µL of short fragment buffer (ONT, Oxford, United Kingdom) to remove excess adapters. Finally, 15 µL of the library was eluted from the beads by elution buffer (ONT, Oxford, Sample collection sites. A Google Earth map showing the locations in Tanzania where 69 nasopharyngeal swabs were collected from individuals confirmed with coronavirus disease-2019 (COVID-19) for this study. Labels with star show the locations where the 29 samples that were sequenced originated; these match the place names in Table 2.
United Kingdom). Quantification of the libraries was done using Qubit Fluorometer v2 (Invitrogen, Thermo Fisher Scientific, MA, United States).

SARS-CoV-2 sequences quality and genome assembly
The ARTIC nCoV-2019 bioinformatics protocol v1.1.0 (13) was used with ARTIC pipeline v1.2.1 to carry out in silico analysis of SARS-CoV-2 sequences. Briefly, basecalling was done with Guppy 5.0.7 using the super high accuracy (sup) model. Demultiplexing was done with Guppy barcoder 5.0.7. Basecalling and demultiplexing were later repeated with Guppy 6.0.1, with no changes in output. Read filtering was performed with guppyplex, selecting for lengths 300-700 bp. Reads were assembled and polished, and consensus sequences produced by the ARTIC MinION 1.2.0 pipeline using Medaka (model r941_min_high_g360, strict, no normalization). Multiple alignments of the consensus sequences were performed with MAFFT v7.475 (14) (globalpair, 5000 iterations). The reference genome was hCoV-19/Wuhan/WIV04/2019 (WIV04, GenBank accession MN908947.3). In-house tools were used to obtain read and base counts, Q7 percentage, and genome coverage. NextClade Web 1.10.0 (15) with default settings was used to assess sequence quality and amplicon dropouts.

SARS-CoV-2 lineage and clade assignment
The consensus sequences were assigned PANGO lineages (16) and WHO designations using Pangolin version 3.0.3 with Pango LEARN version 2021-11-25 and designation version 1.2.101 (17). NextClade Web 1.10.0 (15) was used to assign clades and WHO designations. Single nucleotide polymorphisms (SNPs) were obtained from the ARTIC pipeline "pass" variant call format output, translated to genomic loci using sc2calc (18), and validated using CoV-GLUE 0.1.18 (19). CoVsurver enabled by GISAID (20) and mutation tracker in outbreak.info (21) were used to obtain prevalence and etiology information for mutations. NextClade Web 1.10.0 (15) output was used to visualize the phylogenetic placement of the isolates in NextStrain Auspice 2.32.0 (22). Assignment to a tree node was based on the genetic distance of an isolate to other isolates, where distance was measured in terms of the number of mutations that separate pairs of isolates. Tree branches with fewer nodes were sorted toward the top of the tree using default settings of Interactive Tree Of Life (iTOL) v5 (23). Also, iTOL v5 was used to make tree branch labels, alignment, leaf node symbols and other manipulations. To facilitate interaction with big phylogenetic trees, the files were kept in json files for visualization with auspice web-based tool available at https://auspice.us/?d=tree& p=full.

Ethical considerations
The study protocol was reviewed and approved by Kilimanjaro Christian Medical University College research ethics review committee. Permission to conduct the study was granted by authorities of the KIDH and National Public Health Laboratory. Individual consent from participants was deemed not necessary for use of the archived and previously PCR-tested samples. The results of SARS-CoV-2 sequencing were shared with the Tanzanian Ministry of Health to inform policy actions.

Quality of PCR amplicons and sequence reads
Out of 69 individuals' nasopharyngeal swabs, 40 (58%) showed faint or no bands in 1.5% agarose gel electrophoresis of the amplified SARS-CoV-2 cDNA, suggesting that RNA was degraded and were therefore removed from downstream library preparations. No bands were observed in No-RT control and negative control after 1.5% agarose gel electrophoresis, indicating absence of cross-contamination between samples during conduction of the wet laboratory procedures. The remaining 29 (42%) samples showed strong bands in the gel electrophoresis and were sequenced. Percentage of bases called with Q-score ≥7 (80% accuracy) ranged from 89.

Characteristics of sequenced samples
Demographic and clinical characteristics of the 29 sequenced samples are presented in Table 2. Most samples (n = 17) were collected in April 2021, when the presumed "Beta wave" of the pandemic had subsided. Seventeen samples were collected in Dar es Salaam, the most populous city in Tanzania. Samples were collected from symptomatic individuals suspected to have COVID-19 (n = 17) and from asymptomatic travelers (n = 12). Out of 17 symptomatic individuals, 5 were hospitalized in January 2021 at tertiary hospitals in Dar es Salaam.

SARS-CoV-2 variants and diversity
Out of 29 isolates sequenced, 26 were variants of concern [Beta (n = 22); and Delta (n = 4)]. Other variants included Eta (n = 2) and B.1.530 (n = 1). Two of the four Delta isolates (Tanzania/KIDH-02B14/2021 and Tanzania/KIDH-02B15/2021) were from asymptomatic travelers arriving from India of whom samples were collected on 3 May 2021 ( Table 2), and were placed in NextStrain clades 21A and 21J, respectively. The other two Delta isolates (Tanzania/KIDH-03B05/2021 and Tanzania/KIDH-02B01/2021) were from local symptomatic suspects, collected in Mwanza on 22 April 2021 and 25 April 2021, respectively, and both placed in NextStrain clade 21I. Strains from all three clades of Delta  (Supplementary File 2). The common ancestors of the Beta variant seem to have come from southern Africa to the rest of the world (Supplementary File 2).
Based on the genomic proximity to neighboring sequences in the phylogenetic tree, most Beta isolates from Tanzania appear to descend from strains originating from bordering countries in the south and south-west Africa. Figure 2   Beta *The individuals were hospitalized in Dar es Salaam at different unknown dates. In the category column, all the travelers were outgoing except the two arriving from India indicated by #. Symptomatic case was regarded as an individual who had tested positive for severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) using nucleic acid amplification test with at least one of the following signs and/or symptoms; -fever, cough, tiredness, loss of taste or smell, sore throat, headache, aches and pains, diarrhea, a rash on skin, or discoloration of fingers or toes, red or irritated eyes, difficulty breathing or shortness of breath, loss of speech or mobility or confusion, chest pain. Whereas, asymptomatic cases were individuals who tested positive for SARS-CoV-2 using a nucleic acid amplification test but had no symptoms that were consistent with COVID-19 and were largely from travelers. File 2). Eta was a variant under investigation that is no longer circulating at a pace of global public health significance.
The The four isolates assigned to Delta variant shared ten defining non-synonymous mutations for the Delta variant (S: T19R, S: L452R, S: T478K, S: P681R, S: D950N, ORF3a:S26L, N: D63G, N: R203M, N: D377Y), and a 6nt deletion at 28248. The other signature deletion 22029:6 was found in three out of four isolates assigned to Delta variant. The signature matrix mutation M: I82T for Delta was shared with the two Eta variants. The other two defining mutations for Delta, ORF7a: V82A and ORF7a: T120I were only found in Delta isolates Tanzania/KIDH-02B14/2021 and Tanzania/KIDH-02B15/2021, collected from two arrivals from India. Other mutations that characterized the four Delta isolates were G15451A (nsp12:G671S) and C16466T (nsp13:P77L) (Supplementary File 1).

Nucleotide substitutions in PCR primer binding sites
All 29 genomes had at least one to a maximum of three nucleotide substitution in the primer binding site for RT-PCR test (Supplementary File 1). A total of nine nucleotide substitutions in RT-PCR targeted genomic regions were identified, whereby C28887T in N gene was commonly observed (n = 20). This position is in the genomic site targeted by the China CDC RT-PCR test ( Table 3).

Discussion
This study reports Beta and Delta variants of concern (VOC), as well as the Eta variant of interest isolated by wholegenome sequencing (WGS) from selected samples collected at different time points from January through May 2021 in Tanzania. Most isolates were from symptomatic individuals indicating that SARS-CoV-2 infections were in active circulation at the time of sample collection. The variants had substantial number of mutations including mutations in the primer binding sites for commonly used COVID-19 RT-PCR test kits, suggesting the potential of the SARS-CoV-2 in evading detection by RT-PCR tests. We further report lineage B.1.530 in a sample collected in January 2021 with mutations of very low global prevalence that clustered phylogenetically with isolates first identified in Kenya, suggesting that it was a locally or regionally evolving lineage. Not only do the processes provide a model for describing SARS-CoV-2 evolution in Tanzania in the future, for the first time in the country; we report existence of Delta and Beta VOC from both symptomatic and asymptomatic individuals in the second wave of the pandemic. The VOC have increased transmissibility compared to other variants of SARS-CoV-2, and increased pathogenic potential toward severe disease (9). The increased transmissibility of VOC is conferred by the presence of the highly prevalent mutations in the spike such as D614G and N501Y that orchestrate increased expression of spike protein density that increase infectivity and enhanced angiotensin converting enzyme-2 (ACE-2) binding, respectively (25)(26)(27). Spike mutation L452R, and T478K were also implicated in causing increased transmissibility, infectivity and pathogenicity of SARS-CoV-2 by immune evasion and strong affinity of the viruses to cells through ACE-2 (28,29). In addition, spike mutation P681R was associated with increased viral replication and transmissibility of Delta variant (30). The potential of the VOC, particularly Delta, to spread from asymptomatic individuals was documented previously and this warrants additional measures such as contact tracing in order to curb the spread of the virus (31). Apart from that, the VOC can evade neutralizing antibodies after vaccination, as well as detection by the currently approved diagnostics. These attributes of the VOC are driven partly by the presence of mutations such as E484K in the spike and other targets of the vaccines (32). Likewise, spike K417N is associated with evasion from neutralizing antibody binding although not to a large extent compared to spike E484K mutation (33).
Our finding of a regionally evolving lineage, B.1.530, that was first identified in Kenya in October 2020, is of further importance (34). The lineage showed unique profiles of mutations such as S: N164H, S: T22I, nsp14:S323G, and nsp13:M233I with low cumulative global prevalence and undocumented phenotypes (35). The shared mutation nsp14:S323G with isolates of the same lineage in southern Kenya suggested that the lineage was evolving in the region. It also had the nsp13:M233I substitution that was previously reported in 5 countries, with <0.5% cumulative global prevalence (36). Likewise, we highlight mutation nsp14:S434G that had also very low global prevalence, with 43 reports from southern Kenya: Taita Taveta (n = 28), Kilifi (n = 7), and Mombasa (n = 8) as of October 2020 (34). Although the prevalence of the lineage appears to decline in the wake of the Omicron variant, further research to translate these mutations with low global prevalence to phenotypic characteristics may be particularly important in Tanzania where there may be some ongoing regional circulation.
Unexpectedly, we found that all the 29 SARS-CoV-2 genomes had at least one nucleotide substitution in primer binding sites for RT-PCR. Presence of such substitutions increases the potential of variants to evade detection by RT-PCR diagnostic tests, as documented earlier in the pandemic (37). Mutations in the RT-PCR primer binding sites contributed to false negative RT-PCR tests (38), a driver for underestimation of the true burden of COVID-19 infections as well as reduced prompt clinical and epidemiological interventions (39). Periodic evaluation of the performance characteristics of COVID-19 RT-PCR test kits by sequencing the primer/probe binding sites is warranted. The choice of RT PCR diagnostic test kits with multiple gene targets for SARS-CoV-2 may also increase recovery rates of the virus in the given specimens from Tanzania or other regions where this finding is common (40). Alternatively, two different diagnostic kits may be used in the laboratories to minimize false negative results.

Conclusion
Use of WGS provided evidence of circulation of Beta, Delta, and Eta variants and several regionally evolving variants with signature mutations circulating in Tanzania in early 2021. The VOC had mutations which confer reduced susceptibility to current vaccines and potential to evade detection by diagnostic tests such as RT-PCR. This work provides a model for WGS surveillance-based approaches to inform emerging and circulating SARS-CoV-2 diversity in Tanzania and elsewhere, and we advocate for periodic evaluation of performance characteristics of the RT-PCR tests in diagnosing rapidly mutating SARS-CoV-2 strains.

Study strengths and limitations
In addition to the ARTIC nCoV-2019 sequencing protocol v3 (LoCost) (12), we added a step of performing gel electrophoresis immediately after multiplex PCR to visualize presence of amplicons. In turn, more than a half of the individual samples had faint or no bands suggesting RNA degradation after collection or transportation. The limited sample size and absence of prior SARS-CoV-2 sequences from Tanzania precluded detailed phylodynamic analysis. Without a larger spatio-temporally distributed dataset, it was difficult to make inferences on the provenance, transmission dynamics, and in particular distribution of variants-whether novel, concerning, or otherwise-in the genome population. Another limitation was the moderate to poor genome coverage attained for a number of samples. Twelve sequences had coverage in the range 80-90% (24,000-26,999 non-N bases), and two were just below 80%. The major cause for this was amplicon drop out, which was partly remedied by adding primer pair 74_alt0 to the ARTIC V3 primer pool. Though most of the significant regions of the genomes were covered, such as the S gene and the sites that define the VOC assignments, SNPs located on the unknown parts of genomes cannot contribute information to phylogenetic analysis, hence lead to a less precise placement on the global phylogenetic tree. Finally, the study lacked complete demographic and clinical data that would have strengthened the interpretation of genomic data for SARS-CoV-2 isolates which was a function of using archived samples. Despite these limitations, the processes developed identified steps for improving data completeness and linkage to conventional epidemiological resources that will position WGS as a service to the Tanzanian scientific community.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository and accession numbers can be found below: GenBank, OP236812-OP236838. The sequence files can also be found in GISAID with accession numbers EPI_ISL_16131912-EPI_ISL_16131937.

Ethics statement
The studies involving human participants were reviewed and approved by College Research Ethics Review Committee, Kilimanjaro Christian Medical University College. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions
SRM: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, resources, validation, visualization, writing -original draft, and writingreview and editing. MvZ: data curation, formal analysis, investigation, methodology, resources, validation, visualization, and writing -review and editing. CCK: investigation, methodology, resources, validation, visualization, and writing -review and editing. PMM: validation, visualization, and writing -review and editing. ANM, ASM, and RMK: resources, validation and writing -review and editing. TBS: data curation, validation, visualization, and writingreview and editing. GSK: conceptualization, validation, and writing -review and editing. GG: methodology, resources, and writing -review and editing. SKH: validation and writing -review and editing. JOC: conceptualization, supervision, methodology, validation, and writingreview and editing. SGM: conceptualization, funding acquisition, investigation, methodology, resources, validation, visualization, supervision, and writing -review and editing. All authors contributed to the article and approved the submitted version.

Funding
This work was funded by EDCTP2 program supported by the European Union project (grant number TMA2016SF-1463-REMODELTZ) and the NIHR (project reference 17/63/82) using UK aid from the UK Government to support global health research.
obtaining the specimens, and their submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.