The Delta variant wave in Tunisia: Genetic diversity, spatio-temporal distribution and evidence of the spread of a divergent AY.122 sub-lineage

Introduction The Delta variant posed an increased risk to global public health and rapidly replaced the pre-existent variants worldwide. In this study, the genetic diversity and the spatio-temporal dynamics of 662 SARS-CoV2 genomes obtained during the Delta wave across Tunisia were investigated. Methods Viral whole genome and partial S-segment sequencing was performed using Illumina and Sanger platforms, respectively and lineage assignemnt was assessed using Pangolin version 1.2.4 and scorpio version 3.4.X. Phylogenetic and phylogeographic analyses were achieved using IQ-Tree and Beast programs. Results The age distribution of the infected cases showed a large peak between 25 to 50 years. Twelve Delta sub-lineages were detected nation-wide with AY.122 being the predominant variant representing 94.6% of sequences. AY.122 sequences were highly related and shared the amino-acid change ORF1a:A498V, the synonymous mutations 2746T>C, 3037C>T, 8986C>T, 11332A>G in ORF1a and 23683C>T in the S gene with respect to the Wuhan reference genome (NC_045512.2). Spatio-temporal analysis indicates that the larger cities of Nabeul, Tunis and Kairouan constituted epicenters for the AY.122 sub-lineage and subsequent dispersion to the rest of the country. Discussion This study adds more knowledge about the Delta variant and sub-variants distribution worldwide by documenting genomic and epidemiological data from Tunisia, a North African region. Such results may be helpful to the understanding of future COVID-19 waves and variants.


Introduction
Since December 2019, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2) emerged and spread rapidly all around the world, causing a global health, social and economic crises (1)(2)(3). The World Health Organization (WHO), all international expert networks and national public health authority institutions have been mobilized to monitor the evolution of SARS-CoV-2 and track emergent variants and sub-variants. Variant surveillance is based on genetic characterization and rapid sharing of viral genomic sequences, in order to limit their spread and control the pandemic (4). SARS-CoV-2 variants are classified as Variants Of Concern (VOCs), Variants Of Interest (VOIs) and Variants Under Monitoring (VUMs) (5). VOCs are associated to increased transmissibility and/or virulence, and/or to reduced efficacy of diagnostic tests, vaccines or treatments. VOIs are associated with significant community transmission with multiple foci in multiple countries and/or increasing prevalence. VUMs present unclear phenotypic or epidemiological repercussions requiring enhanced surveillance.
Among VOCs, the Delta variant posed an increased risk to the global public health and rapidly displaced the then dominant Alpha variant globally and changed the epidemiological infection landscape of this virus during 2021 (6). Delta variant is the common name used to refer to the B.1.617.2 Pango-lineage (7). In late 2020/early 2021, the parental strain B.1.617 emerged and rapidly spread to at least 60 countries (8)(9)(10). The first detected sub-lineage was B.1.617.1, also named Kappa variant, followed by B.1.617.2, the Delta variant, and B.1.617.3, the Epsilon variant (7). The Delta variant is one of the most contagious variants of the SARS-CoV-2. First detected on March 01 2021, it spread rapidly to almost all countries of the world, replacing the Alpha variant which was predominating globally (7, 11). Delta harbors mutations along the genome; among those in the gene encoding the spike protein, the substitutions T19R and T478K as well as the deletion E156-F157 may enhance antibodies' evasion causing escape from the host immune response (11,12) and L452R, P681R may help cell entry by enforcing attachment to the ACE2 receptor and membrane fusion (13)(14)(15). Several studies demonstrated increased replication fitness (6) and reduced sensitivity to neutralizing antibodies from past infection (14) which contributed to the deadly epidemic wave in India during the first quarter of 2021, as well as the epidemic waves in England, South Africa and then almost all countries in the world (16)(17)(18)(19)(20). Up to February 2022, Delta continued to circulate globally, even after the emergence and spread of Omicron variant, infecting especially unvaccinated populations or in-complete vaccinated populations, including children. Furthermore, an important variability of this variant was reported in different areas of the world. Up to 2022-11-03, Delta variant has been shown to circulate into 245 different sub-lineages: AY.1-AY.134 (8).
At the beginning of COVID-19 pandemic, Tunisia adopted a very stringent lockdown regime from 10th of March to 3rd of May 2020 which helped contain the virus spread. Since July 2020, the country experienced several waves of COVID-19. The first two ones, in 2020 and up to January 2021, were caused by several SARS-CoV2 lineages (21-23). The first detected VOC was the Alpha variant that induced an extensive wave from March to June 2021. In May 2021, Delta was detected for the first time and started replacing the Alpha variant, causing a substantial increase in the number of cases with high transmission rates within the community (23). It became rapidly the dominant variant in the country with great impact on public health and healthcare systems, increasing the number of hospitlaisation requirement for oxygen. By the end of December 2021, a slight decline of COVID-19 incidence was notified but, since January 2022, the number of cases increased again causing another wave mainly induced by the Omicron variant.
This study reported demographic and genetic data of 662 SARS-CoV2 Tunisian sequences belonging to the Delta variant, obtained from oropharyngeal swabs collected from May to December 2021, during the Delta wave. The genetic diversity and the spatio-temporal dynamics of virus transmission in the country are described.

Ethical statement
This study was performed under ethical standards according to the 1964 Declaration of Helsinki and its later amendments. The samples were collected in the context of COVID-19 diagnostic activities. They were used in the present study after de-identification with respect to patient anonymity and after the approval of the Bio-Medical Ethics Committee of Pasteur Institute of Tunis, Tunisia (2020/14/I/LR16IPT/V1). Written informed consent to participate in this study was provided by the participants 'legal guardian. No animal studies are presented in this manuscript. No potentially identifiable human images or data is presented in this study.

Tunisian sequences
A total of 662 SARS-CoV-2 sequence, obtained from positive nasopharyngeal samples and lately identified as belonging to the Delta variant, were included in the present work. They were obtained from May, 26 to December 1st, 2021, during the Delta wave in Tunisia ( Figure 1). The samples were collected by the Tunisian Ministry of Health personnel (MoH's personnel). They included samples collected from home-quarantined and hospitalized individuals. The collected samples were transported, under refrigeration and within 24 h, to the Pasteur Institute of Tunis where they were immediately processed for SARS-CoV-2 RNA detection by specific realtime reverse transcription PCRs (RT-qPCR) according to WHO-approved protocols (24,25). Samples were tested by Charité, Berlin protocol [singleplex envelop (E) and singleplex RNA-dependent RNA polymerase (RdRp)] (24), Hong Kong Universiy, China protocol (singleplex nucleoprotein (N) and singleplex Open reading frame Orf1b) (25), according to the reagents availability, in the pandemic context. SARS-CoV-2 variant identification was assessed using a combination of nextgeneration sequencing-based approaches (N = 484 samples) and by partial sequencing of the S gene (N = 178 samples).
PCR amplification and partial sequencing in the S gene PCR amplification of S gene was performed as previously described (22)  The quality control of raw data was first checked using FastQC version 0.11.9 for (https://www.bioinformatics. babraham.ac.uk/projects/fastqc/). Low-quality reads and adapters were filtered using Trimmomatic version 0.39 (27) with a Phred quality score of 30 as the threshold. Consensus sequences were generated by mapping to the SARS-CoV-2 reference genome (NC_045512) via Spades assembler version 3.15.0 (28), using thresholds of 80% for nucleotide sequence coverage and 90% for nucleotide similarity. The sequences were submitted to the GISAID database (https://www.gisaid.org) (29, 30).

Variant assignment and single nucleotide polymorphisms identification
The Fasta files of the whole sequences were uploaded into Pangolin online software version 3.1.17 (https://cov-lineages. org/pangolin.html; accessed on 01/16/2022) for variant and sublineage assignment (31). Sub-lineages within the Delta variant could be assigned for those having a coverage of more than 85% of the virus genome. The web-based software Nextclade v1.12.0 (https://clades.nextstrain.org/) was used for clade assignment, diversity and gene-wise amino acid mutations analysis (32). Maximum-likelihood trees were generated by the use of IQ-TREE multicore program version 1.6.12 (33).
For the 178 partial sequences and the NGS sequences with incomplete coverage of the whole genome, variant determination was based on the identification of key mutations specific to the Delta variant along the partial and complete S gene, respectively.

Mutational cascade analysis
Mutational cascade was analyzed for AY.122 variant of SARS-CoV-2 circulating in Tunisia using NUCmer pipeline (version 4.0.0) and the R script detailed by Bansal and Kumar (34). The SNP analysis was conducted by NUCmer v.4.0.0 using genome of first reported AY.122 variant (OV103802.1). This reference genome was retrieved from NCBI Genbank (https:// www.ncbi.nlm.nih.gov/nuccore/OV103802.1/) and chosen for no N character in the nucleotide sequence as selection criteria. The SNP file generated by NUCmer v.4.0.0, the R script was launched to identify: most mutated samples, number of mutations per sample, mutation classes, and most frequent mutational events per type. In this analysis, a gff3 annotation file of the Wuhan reference genome (NC_045512.2) was used to extract coordinates of the proteins encoded by SARS-CoV-2 genome.

Phylogenetic analysis
Viral phylogeny was inferred on Tunisian generated sequences and others originating from different areas of the world, available at the NCBI GenBank

Phylogeographic analysis
To explore spatio-temporal distribution of investigated strains, the temporal signal was undertaken using TempEST v1.5.4 (38). Bayesian phylogeographic analysis used BEAST v1. 8.4 (39), the discrete diffusion model approach was applied (40). The GTR (I+Γ 4), strict clock and exponential growth models are applied as the best-fit models. To estimate the transition rates across regions, the Bayesian Stochastic Search Variable Selection (BSSVS) approach was inferred with application of the symmetric substitution model. A 200-million iteration was run, achieving a satisfying ESS value (ESS>200). By discarding 10% of burn-in, the time-scaled maximum clade credibility (MCC) tree was generated with Tree Annotator v1.8.4 (40) and visualized with FigTree v1.4.3 (37) programs. To evaluate the robustness of virus transitions between geographic locations, the Bayes factors (BF) were calculated by using the SpreaD3 program version 1.0.7 (41). A transition rate is considered as statistically valid when BF>3. The spatiotemporal dynamics of strains was studied by generating a KML file with the SpreaD3 program and then visualized via Google Earth Pro.

Data availability
Complete genome SARS-CoV-2 sequences generated in this study were submitted to the GISAID database (https://www. gisaid.org) (29, 30). Their accession numbers are provided in Supplementary Table S1.

Epidemiological features of collected samples
The Delta sequences included in this study were obtained from individuals originating from 23 of the 24 Tunisian governorates. They were collected from symptomatic patients with a wide range of clinical presentations, as well as asymptomatic individuals sampled after contact with confirmed cases. The study population included 303 males and 359 females; their ages ranged from 1 month to 90 years. Figure 2 displays the distribution of SARS-CoV2 sequences according to the age. More than half of the sequences (52.6%) were obtained from individuals aged 25 to 50 years, with statistical significant difference (P = 0.0004 < 0.001), obtained by student's t-test. An important proportion is also found among children and teenagers (20.0%) as well as seniors aged 50-65 (18.4%). Only 9.0% of sequences were obtained from individuals aged 65 years and more.

Sequence analysis and variant assignement
Among the 484 whole genome sequences, variant and subvariant could be assigned for 468 sequences displaying < 15% ambiguous nucleotides.
Using Nextclade program, the 468 complete sequences belonged to clades 21J, 21I and 21A which correspond to the Delta variant. The majority of sequences constitute a group belonging to the 21J clade and other sporadic strains grouped within the 21A, 21I clades (Supplementary Figure S1). Similar group profile was obtained with the Maximum Likelihood (ML) tree obtained by Iqtree software (Supplementary Figure S2

Mutational cascade
For more investigation about evolution trait of emergent AY.122 variant (n = 165) in Tunisia, we constructed the mutational cascade based on comparison with the first reported AY.122 strain (Figure 4). According to the mutational analysis, the most mutant strains are: hCoV-19_TUN_Nab_S1343_27102021 and hCoV-19_TUN_Nab_S1349_02112021, where they accumulated more than 30 mutations in their genomes ( Figure 4A). Figure 4B showed that most genomes accumulated 20 to 26 mutations during AY.122 variant emergence in Tunisia. According to the Figure 4C, the single nucleotide polymorphism (SNP) is the most prevalent mutation class among the AY.122 variant circulating in Tunisia. The mutational analysis also showed that C>T and A>G transitions are the most frequent mutation events ( Figure 4D).

Phylogenetic analysis
The phylogenetic analysis was performed on the AY.122 complete genome sequences, representing the most frequently detected Delta sub-lineage. The analysis involved a total of 391 AY.122 virus genomes including the Wuhan reference sequence as outgroup, 165 Tunisian sequences with complete genome coverage and 225 sequences from different geographic regions of the world. Indeed, to obtain reliable results, only complete sequences without ambiguous nucleotides (N) into the coding regions were considered. The 165 AY.122 complete genome Tunisian sequences originated from 17 different governorates and during the whole study period. Among the AY.122 complete genome sequences from other countries, available on Genbank, up to10 complete sequences per country and per month were first selected (n = 780) and then only   Figure 5 represents the Maximum-Likelihood tree generated using the GTR+F+R2 model (Best fit model according to BIC) with a bootstrap replication of 1000 cycles. The majority of Tunisian sequences (163 out of 165) clustered together, independently from sequences of other countries. They were grouped into three major clusters. Cluster1 included 17 Tunisian sequences collected in limited areas of Tunisia (Tunis and Kairouan); they formed two groups according to their geographic origin. Cluster2 included a large number of Tunisian sequences (n = 105) from different governorates. Sequences from same region were highly related and likely to form different groups. This cluster included few sequences .
/fpubh. .  from UK. The Cluster 3 combines 43 sequences from three Tunisian regions (the region of Tunis, Nabeul and South Tunisia) with few sequences from Germany, Switzerland and UK. Figure 6 displays the mutational profile of the Tunisian sequences. The web-based software Nextclade v1.12.0 (https:// clades.nextstrain.org/) was used for gene-wise amino acid mutations analysis (32). In addition to the amino-acid changes and mutations specific to the AY.122 sub-lineage, indicated in blue, all Tunisian sequences, from the three clusters, shared the amino-acid change A498V in ORF1a (indicated in dark purple), the synonymous mutations 2746T>C, 3037C>T, 8986C>T and 11332A>G in ORF1a and 23683C>T in the S gene (indicated in light purple). The same changes were also found with high frequency in all the 443 Tunisian AY.122 sequences: A498V was present in 441 sequences, 2746T>C, 3037C>T, 8986C>T, 11332A>G were found in 418, 441, 437 and 439 sequences, respectively and 23683C>T was found in 433AY.122 sequences.
In addition, the Tunisian sequences from Clusters 1 and 3 differ from the cluster 2 by the presence of the amino-acid changes S: G181V and the synonymous mutations 5392C>T, 19245C>T (indicated in Green). Also, strains from Cluster 3differ from the other by the presence of ORF1b: S2689G amino acid change.

Spatio-temporal dynamics of AY. Tunisian sequences
From the data set composed of the 165 AY.122 Tunisian sequences, with no ambiguous nucleotides, an ML tree was constructed and used to explore the temporal signal of the selected dataset. The root-to-tip regression analysis shows a positive correlation (R 2 = 0.37) between genetic distances and sampling dates (Supplementary Figure S3).
Molecular clock analysis shows that the TMRCA (Time to the Most Recent Common Ancestor) for the root of AY.122 variant in Tunisia was estimated to be around October 03th, 2020 [July 19th, 2020-January 02th, 2021], which could correspond to the date of its first emergence in the world (6).
To gain insight into spatio-temporal dynamics of the AY.122 variant dispersal in Tunisia, we performed a Bayesian analysis based on phylogeography framework reconstruction. In the time-scaled phylogeographic MCC (Maximum Clade Credibility) tree, we observed that most nodes and branches have colors assigning the inferred location of Tunis (Supplementary Figure S4). Furthermore, the root state posterior probability (RSPP) analysis shows that Tunis has the highest value (RSPP = 0.95). This finding indicates that Tunis might be responsible for the spread of the AY.122 Delta SARS-CoV-2 variant in the country.
The spatial estimates of the MCC tree were than mapped on Google Earth (Figure 7, Supplementary material 1). The generated configuration of spatial dynamics shows that the AY.122 dispersal in Tunisia started from Tunis (the capital) to Nabeul (North-east). The virus was transmitted then from Nabeul to Kairouan, in the center of the country. Soon after, the role of Nabeul as primary epicenter of AY.122 variant became clearer; transitions were observed transitions with Manouba (North-east) and then with Gafsa (central-west). In this period, the AY.122 infection zones started to expand across the country from Nabeul, Tunis and Kairouan. Subsequently, Tunis and Kairouan became secondary epicenters of intensified AY.122 outbreaks which affected more governorates in the North, in the Center, and the South.
To provide statistical support for the phylodynamic reconstruction, we determined the Bayes Factor (BF) for the identified transitions between the different Tunisian governorates ( Figure 8A, Supplementary Table S2). We found that all the transitions between the governorates are statistically supported (BF>3) except for the transition "Nabeul-Tataouine" (BF = 2.58). The highest Bayes Factor (BF>100) was recorded for the transitions linking: Tunis to Ariana (BF = 137653.83), .
/fpubh. . Higher decisive support of the Bayes Factor was found when governorates were assembled by regions ( Figure 8B

Discussion
Since May 2021, the Delta variant dominated all the other SARS-CoV-2 variants and continued to largely shape the pandemic up to February 2022 (15). It overwhelmed health care systems, infecting mainly unvaccinated and non-fully vaccinated populations, especially in countries with limited access to vaccines (41, 42).
The Delta wave started in Tunisia, in May 2021. The present work shows that Delta infections occurred mainly in young adults. The age distribution of investigated cases showed a large peak between the ages of 25 to 50 years, but several cases were also found in children and teenagers, as well as seniors aged 50 to 65 years. Indeed, COVID-19 vaccination started in Tunisia only by March 13, 2021, with a slow cadence. Health worker and individuals aged 75years and over were first vaccinated, followed by individuals from age groups 65-74 and then 60-64 years (43). Low vaccination coverage was achieved by May 2021, when Delta was introduced, and slow progression of vaccination campaigns was reported. Vaccination of individuals from age groups 50-59, 40-49, 30-39 was initiated progressively from June 1, to August 9, 2021(44-46). Vaccination of the age group 18-29 started later, by October 10, 2021 (47). Thus, the stuttering vaccination campaign, coupled the advent of the delta variant may be the cause of the deterioration of the health situation in Tunisia and explain the age distribution of Delta cases that we report herein. Furthermore, when Delta was introduced in the country, most of the already vaccinated peoples received only one dose which has been reported to be insufficient to induce effective protection against the Delta variant. Indeed, clinical evidence indicated decline in vaccine's effectiveness in the case of Delta variant infection (48). In vitro tests also showed lower neutralization activity of vaccine-induced antibodies against the Delta variant (49) (7). In our series and among all the Delta sub-lineages reported globally, twelve were detected in Tunisia with a large predominance of the AY.122 sub-lineage, which represented more than 94% of detected sequences. This sub-lineage was present in the country, from the beginning of the delta wave in May 2021 till its end in December 2021. It also spread to almost all regions of the country, including rural and urban regions. This suggests that it has established an autochthone transmission. Although, the number of samples obtained from rural regions is limited, the localization of main Tunisian population into the coastal regions around the big tows and the high connection between population in rural and urban regions are online with our finding. In the other hand, the eleven other variants were detected during shorter periods of time and in more restricted geographical areas, especially in the North-East region where are found the biggest cities and the main airports. This suggests that they were introduced through multiple importation events, mainly since August 2021, and that these minor sub-lineages were not able to establish autochthone transmission in the presence of the already circulating AY.122 sub-lineage. The restricted number of circulating sub-lineages during the Delta wave in Tunisia may also reflect the efficiency of public health measures taken place at the national gateways and entry points connected to the Tunisian border. The observation of isolation measures by travelers after arrival at the national territory may have modulated dispersion of numerous sub-lineages. In other contexts, such as in India, multiple introductions at the same time resulted in a rapid Frontiers in Public Health frontiersin.org . /fpubh. . community spread and a wide dispersion of sub-lineages among the population (52). In Tunisia, the main population is located at coastal regions with important exchanges between rural and urban populations. The AY.122 sub-lineage was first detected in India, 2020-09-07 (8). It's mainly circulating in Eastern-European regions such as Belarus, Armania, Russia (>40% of sequenced samples) and is also frequently detected in other European countries such as Monaco (>40%) and in Asian regions such as Kazakhstan and China (41 and 27%, respectively) (7). AY.122 also represents 70% sequenced samples in Seychelles (7). Given the strategic localization of Tunisia in the middle of the Mediterranean Sea, the high trade exchange with Europe, the high number of tourists visiting Tunisia and also the localization of the Tunisian population mainly in coastal regions, AY.122 may have been imported from Europe, especially from Eastern-Europe region. Indeed, an increased number of tourists coming from this region during the last years was recorded. In the other hand, countries such as Monaco and Seychelles also constitute privileged destinations for Tunisian tourists (53). Spatio-temporal analyses conducted in the present work suggest the introduction of the AY.122 sub-lineage first in Tunis, the capital and then its propagation to two other big cities that constituted after-on, together with Tunis, epicenters for the propagation of the virus all over the country. The mutational cascade describing important evolution of virus obtained in November-December, from big Tunisian cities confirm also our finding. Similar widespread of the same sub-lineage in Russia was recently reported (54). In Tunisia, and especially in big cities, it is evident that the local commercial exchanges, regular market visits and families' celebrations, during the summer time may have accelerated community transmission. The low SARS-CoV2 vaccine coverage achieved during the Delta wave as well as the decline in prevention measures within the population may explain the establishment of autochthone transmission.
Noticeably, the AY.122 Tunisian investigated sequences showed were highly related to each other and constituted an individualized genetic group independently from most of the sequences from other countries included in the phylogenetic tree. The mutational profile reveals that most of the Tunisian sequences share the A498V amino acid change in the ORF1a and 5 other synonymous mutations: 2746T>C, 3037C>T, 8986C>T, 11332A>G in ORF1a and 23683C>T in the S gene. In addition, the amino acid changes S: G181V; W258R were found with important rates. To the best of our knowledge, this is the first report of such amino acid changes among the AY.122 sub-lineage. The substitution A498V in ORF1a was previously reported among Wuhan-like strains Malaysian sequences, obtained during the first and second SARS-CoV2 waves, January-February 2020 (55). The G181V was reported in the Brazilian P.1 variant, but its impact was not investigated (56). The W258R in the S gene was reported among COVID-19 family clusters, including severe cases (57). The spread of viral lineages may rely on different factors such as virus properties, host genetic and immunological background, demography and socio-cultural behavior of the population and, probably, the combination of different factors. The AY.122 Tunisian strains may owe their successful spread to the presence of particular amino-acid substitutions. Even no evidence of increased virulence or transmission related to these changes were reported, It will be relevant to investigate their impact on protein structure and function to better understand the AY.122 spread.
In conclusion, this study adds to the knowledge on the Delta variant and sub-variant distribution worldwide and documents genomic data from Tunisia, North Africa. The introduction of a restricted number of delta variants in Tunisia may reflect the efficiency of public health measures taken place at the national gateways and entry points connected to the Tunisian border. Nevertheless, the decline in prevention measures within the population may explain the establishment of autochthone transmission of AY.122 sublineage.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.

Ethics statement
The studies involving human participants were reviewed and approved by Bio-Medical Ethics Committee of Pasteur Institute of Tunis, Tunisia (2020/14/I/LR16IPT/V1). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.