Molecular Epidemiology of AY.28 and AY.104 Delta Sub-lineages in Sri Lanka

Background The worst SARS-CoV-2 outbreak in Sri Lanka was due to the two Sri Lankan delta sub-lineages AY.28 and AY.104. We proceeded to further characterize the mutations and clinical disease severity of these two sub-lineages. Methods 705 delta SARS-CoV-2 genomes sequenced by our laboratory from mid-May to November 2021 using Illumina and Oxford Nanopore were included in the analysis. The clinical disease severity of 440/705 individuals were further analyzed to determine if infection with either AY.28 or AY.104 was associated with more severe disease. Sub-genomic RNA (sg-RNA) expression was analyzed using periscope. Results AY.28 was the dominant variant throughout the outbreak, accounting for 67.7% of infections during the peak of the outbreak. AY.28 had three lineage defining mutations in the spike protein: A222V (92.80%), A701S (88.06%), and A1078S (92.04%) and seven in the ORF1a: R24C, K634N, P1640L, A2994V, A3209V, V3718A, and T3750I. AY.104 was characterized by the high prevalence of T95I (90.81%) and T572L (65.01%) mutations in the spike protein and by the absence of P1640L (94.28%) in ORF1a with the presence of A1918V (98.58%) mutation. The mean sgRNA expression levels of ORF6 in AY.28 were significantly higher compared to AY.104 (p < 0.0001) and B.1.617.2 (p < 0.01). Also, ORF3a showed significantly higher sgRNA expression in AY.28 compared to AY.104 (p < 0.0001). There was no difference in the clinical disease severity or duration of hospitalization in individuals infected with these sub lineages. Conclusions Therefore, AY.28 and AY.104 appear to have a fitness advantage over the parental delta variant (B.1.617.2), while AY.28 also had a higher expression of sg-RNA compared to other sub-lineages. The clinical implications of these should be further investigated.


INTRODUCTION
The SARS-CoV-2 virus continues to result in outbreaks in many geographical regions, with the number of cases exponentially increasing in many countries due to the rapid transmission of Omicron (1). Of the five variants of concern (VOCs) that have been identified so far, the delta variant is associated with more severe disease compared to other variants (2,3). Until Omicron emerged, the delta variant was the most transmissible variant and rapidly displaced all other VOCs and variants of interest (4). Due to the higher transmissibility and increased virulence of the delta variant, outbreaks due to the delta wave were associated with the highest mortality, intensive care admissions and hospitalizations so far in all countries (1).
As the delta variant was the dominant variant globally before emergence of Omicron for the longest time period during the COVID-19 pandemic, it gave rise to over 100 sub lineages. Phylogenetic Assignment of Named Global Outbreak (PANGO) nomenclature has currently assigned 122 sub lineages of delta (AY.1 -AY.122) that are distributed in different geographical regions (4). The sub-lineages of the delta variants have not shown to be functionally different to the parental delta variant (B.1.617.2) and have shown to have a similar susceptibility to neutralizing antibodies (5). However, some of these sub lineages such AY.4.2 have been assigned as a variant of interest due to its possible higher transmissibility compared to B.1.617.2 the parental delta variant (6). While certain mutations such as the presence of A222V has shown to cause slightly higher viral titers, which is thought to result in a higher transmissibility (7), mutations such as the E484K have a possibility of enhanced immune evasion (8). Therefore, it is important to study the evolution and spread of different delta sub lineages to understand their transmission and to detect possible changes associated with virulence and immune evasion. Sri Lanka experienced 3 major outbreaks since the identification of patient zero in March 2020. The first large outbreak, which occurred from October 2020 to January 2021 was driven by B.1.411 which peaked at 800 cases/day until it was completely replaced by B.1.1.7 (alpha) in mid-April (9). The outbreak due to the alpha variant, which began in April 2021 to June 2021, with the reported number of daily cases as high as 9,950 (10). During this time period (October 2020 to June 2021), apart from 9 imported B.1.351 (Beta) and B.1.525 (Eta) cases which were identified in overseas visitors in quarantine facilities (11), Sri Lanka did not experience any outbreaks due to other VOCs.
The largest Sri Lankan SARS-CoV-2 outbreak was due to the delta variant and its sub lineages from July to end of October 2021 (12). During the peak of this 'delta wave' the PCR positivity rates rose above 30% with case fatality rates reaching 6.35% (12). Apart from the delta parental lineage B.1.617.2, two other delta sub lineages AY.28 and AY.104 were the predominant variants observed during this time period (4). The AY.28 and AY.104 were assigned as Sri Lanka delta sub-lineages as they were found to originate in Sri Lanka and to be transmitted to all continents in the world (13,14). In this study, we discuss lineage defining mutations, sub genomic RNA expression, relative frequency over time and clinical disease severity of individuals infected with either AY.28, AY.104 or B.1.617.2 in Sri Lanka.

Identification of AY.28 and AY.104 Lineages in Sri Lanka
A total of 1,091 sputum or nasopharyngeal swab samples collected from individuals who presented to government and private hospitals with a COVID-19 like illness, and samples collected as a part of the sentinel surveillance of influenza-like illness (ILI) or acute respiratory infection (ARI) from mid-May to November 2021 were sequenced using Illumina (n = 188) and Oxford Nanopore (n = 903) platforms. 705/1,091 delta sequences with > 70% genome coverage were included in the analysis. Details of RNA extraction, library preparation, and analysis are given in Supplementary Methods. Of these 335/705 (47.5%) were assigned to AY.28 and 217/705 (28.5%) were assigned to AY.104, while 68/705 (9.6%) were assigned to B.1.617.2. The rest of the sequences were assigned to various AY sub-lineages of Delta (<24) by pangolin v3.1.16 (https://github.com/cov-lineages/pangolin) with the pangoLearn model released on 2021-11-25. In order to analyze the frequency of infections caused by these sub-lineages over time, we analyzed the relative change in the frequencies of AY.28 and AY.104 in the Colombo district from 15 th July to 30 th November 2021. As the frequency of delta was <50% of the SARS-CoV-2 viruses that were sequenced before 15 th July 2021, they were not included in the analysis. Metadata and statistical analysis of all the 705 samples are included in the Supplementary Table 1.
The clinical disease severity and vaccination status of 440/705 individuals who were found to be infected with the delta variant during this time period were further analyzed in order to determine if infection with either AY.28 or AY.104 was associated with more severe disease or with the type of vaccination. Those who were not hospitalized or who were hospitalized and were not given oxygen were considered as having mild illness, whereas those who were given oxygen or required intensive care admission were classified as moderate/severe based on the WHO guidelines in COVID-19 clinical disease classification (15). Pearson's Chi-squared test was used to determine the associations between categorical variables (gender, vaccination, vaccine dose, and disease severity) and sub-lineages while pairwise T-tests with Bonferroni correction to compare means of age and hospitalization period. All statistical tests were done using R version 4.1.2.

Mutational and Phylogenetic Analysis of AY.28 and AY.104
In addition to the delta variants sequenced by us, all the AY.28 (n = 519) and AY.104 (n = 493) sequences available at GISAID were downloaded from the GISAID database and aligned to the reference sequence using Nextalign (https://github.com/ neherlab/nextalign). Amino acid mutations and their frequencies for Spike, ORF1a, ORF1b, and N proteins were calculated, and frequencies of ambiguous amino acids derived from ambiguous nucleotides were removed using in-house python scripts. The phylogenetic tree was inferred by Maximum Likelihood in IQ-Tree (version 1.6.12) using the GTR+G model of nucleotide substitution and 1000 replicates of ultrafast bootstrapping (-B 1000) and SH-aLRT branch test (-alrt 1000). The ML tree was then time stamped with TreeTime (16) (version 0.7.5) using least-squares criteria and the evolutionary rate of 1.1 * 10 −3 subs/site/year as described by Duchene et al. (17). Six sequenced with inconsistent temporal signal were removed from the analysis. The tree was rendered using ggtree in R version 4.1.2.

Sub-genomic RNA Expression of AY.28 and AY.104
Sub-genomic RNA expression of 705 delta genomes (335 AY.28, 217 AY.104 and 68 B.1.617.2) were analyzed using periscope (https://github.com/sheffield-bioinformatics-core/periscope). This algorithm aligns raw reads against the SARS-CoV-2 reference genome (MN908947.3) and identifies reads that contain the leader sequence at their start position. Depending on the amplicon position, the sgRNA detected reads were counted and classified into each ORF of the virus. Means of sgRNA counts normalized to per 1000 genomic RNA reads, were then compared individually for each of the ORFs using the unpaired Wilcoxon test by adjusting p-values with the Holm method. Statistical analysis results are presented in Supplementary Table 2

RESULTS
We have previously described the changes in the SARS-CoV-2 variants from the onset of the pandemic to May 2021 in Sri Lanka (9). The first delta variant was identified in Sri Lanka on 22 nd May (AY.28), and the relative frequency of delta and sub lineages in Sri Lanka is shown in Figure 1A. By the first week of December 2021, out of all the delta variants sequenced from Sri Lanka, 481/974 (49.4%) belonged to AY. 28  Mutational and Structural Analysis of AY.28 and AY.104 AY.28 had three lineage defining mutations in the spike protein: A222V (92.80%), A701S (88.06%), and A1078S (92.04%). ORF1a of AY.28 consists of seven lineage defining mutations: R24C, K634N, P1640L, A2994V, A3209V, V3718A, and T3750I with prevalence between 97.83% to 87.96%, and A1918 (96.59%) in ORF1b, and G215 (96.21%) in the N protein, which is present in the SARS-CoV-2 wild-type virus ( Figure 1B). AY.104 was characterized by the high prevalence of T95I (90.81%) mutation and T572L (65.01%) mutation in the spike protein, A1918V (98.58%) in ORF1a, G215C mutation (98.98%) in N protein. In addition, it is characterized by the absence of P1640L (94.28%) in ORF1a as seen in the wild-type virus ( Figure 1B).
The pre-fusion (closed-conformation) surface representation of the SARS-CoV-2 B.1.671.2 spike trimer (PDB: 6ZGE) mapped by PyMOL v.2.4.0 is shown in Figure 1C. The receptor-binding domain (RBD) is colored in orange while the N-terminal and S2 subunit are shown in gray. Amino acid substitutions and deletions that correspond to each delta sub-lineage are shown in two separate diagrams. Ribbon diagram view of the spike trimer common to both AY.28 and AY.104 of the same is shown on the top right ( Figure 1C). Interestingly, a separate node of AY.28 predominantly sampled in USA appears to have originated in mid-January 2021. The tips are color annotated according to the country where the sequences were identified, which shows Sri Lanka as the likely country of origin (Figure 3). Majority of the AY.28 and AY.104 sequences detected outside of Sri Lanka were reported from the United Kingdom, India, Japan, Canada and Australia (Figure 3).

Sub-genomic RNA Expression of Delta Sub-lineages
We analyzed 335 AY.28 genomes and 217 AY.104 and 68 B.1.617.2 for expression of sgRNA. The highest sgRNA expression was observed in the spike protein of B.1.617.2 and the sub-lineages followed by ORF7a (Figure 4). The mean sgRNA expression levels of ORF6 in AY.28 was significantly higher compared to AY.104 (p < 0.0001) and B.1.617.2 (p < 0.01). Also, ORF3a showed significantly higher sgRNA expression in AY.28 compared to AY.104 (p < 0.0001). However, AY.104   Of the 440 individuals who were infected with the delta variant, the mean age of those infected with AY.28 was 38.39 (SD ± 15.32) and AY.104 was 38.27 (SD ± 17.36) and therefore was not significantly different (p = 0.86). There was no difference in the gender of those who were infected with different delta sub-lineages (p = 0.39). 156/160 (98%) individuals infected with AY.28 had mild infection, while 3/160 (1.88%) had moderate/severe illness and one individual succumbed to the illness. Of those who were infected with AY.104 all 63/63 (100%) patients developed mild illness. All those who were infected with B.1.617.2 18/18 (100%) also developed mild illness and there were no deaths. However, there was no significant difference between clinical disease severity in these three groups (p = 0.72). Individuals were hospitalized between an average of 3 to 60 days. The mean duration of hospitalization of those infected with AY.28 was 13.91 (SD ± 5.88) days, for AY.104 a mean of 14.33 (SD ± 5.93) days and for B.

DISCUSSION
In this study we have described the molecular epidemiology of the delta variant and its sub lineages during a massive outbreak in Sri Lanka, which occurred from July to November 2021. We detected the first delta variant in the community in the Colombo Municipality Council area in the third week of May 2021. This initial cluster which originated in Colombo belonged to AY.28 sub-lineage, while the first AY.104 infection was detected in the 2 nd week of June. By the first week of December, of the delta variants sequenced, 46.3% were of the AY.28 sub lineage, while only 33.6% were of AY.104 sub lineage. AY.28 was seen to dominate over the parental B.617.2 lineage, possibly due to the presence of the A222V mutation, which has been previously suggested to associate with higher transmissibility (7). It was shown that the presence of the A222V mutation promotes an increased opening of the receptor binding domain (RBD) and slightly increases binding to ACE2 compared to the D614G SARS-CoV-2 variant (18). However, although AY.28 was the dominant variant during a major part of the outbreak, an equal prevalence of AY.104 and AY.28 was seen since September 2021, after AY.104 was first detected in June 2021. Although it is not known if the T95I mutation in spike seen in AY.104 gave it a fitness advantage in transmission over the parental delta variant, AY.104 is also likely to be more transmissible than B.1. accounted for 3.39% of the sequenced variants by the first week of December.

as [ref]as this only
The spike protein of AY.28 has 13 mutations (>75% prevalence) while AY.104 and the parental delta variant (B.1.617.2) have only 11 and 10 mutations respectively. Furthermore, the ORF1a of AY.28 has 7 mutations, whereas AY.104 and B.1.617.2 have 6 mutations. However, AY.28 has fewer mutations in ORF1b and nucleocapsid (N) proteins compared to the other two lineages. Mutations in the N protein has shown to increase infectivity, virulence, and fitness of the virus, with the R203K/G204R mutations being associated with more severe disease in hamster models (19). The R203K/G204R mutations were shown to inhibit GSK-3 kinase and therefore, resulted in increased viral replication (20). The R203M mutation is seen in AY.28, AY.104 and the parental delta lineage and the R203M was shown to enhance immune evasion by the delta variant along with the L452R mutation (21). In addition to the mutations in the R203 region, G215C, which is a lineage defining mutation in AY.104 has shown to improve viral assembly leading to higher viral loads (22). Therefore, in addition to the mutations in the spike protein, certain mutations in the N protein appear to lead to increased viral virulence, infectivity and immune evasion. Unfortunately, due to the non-availability of biosafety 3 level laboratory facilities, we could not isolate these viruses and further characterize the significance of these mutations. Our analysis of clinical disease outcomes and duration of hospitalization in a sub cohort of individuals infected with AY.28, AY.104 and B.1.617.2 showed that there was no difference in the clinical disease severity or duration of hospitalization in individuals infected with these sub lineages. However, only 3 individuals in those cohort developed severe disease, while only one individual succumbed to the illness. Therefore, since only 4/440 (0.9%) individuals (all infected with AY.28) had adverse disease outcomes, the sample size is unlikely to be adequate to determine if infection with these sub lineages associate with more severe disease.
Structural proteins of coronaviruses are first transcribed into sgRNA before translation (23). Although the presence of sgRNA per se does not indicate the presence of actively replicating virus, the relative abundance of sgRNA indicates the relative expression of different ORFs of the virus (23,24). Many ORFs of the SARS-CoV-2 have shown to suppress interferon gene transcription, interferon production and recognition by innate immune responses, thereby inhibiting innate immune antiviral responses (25). Although our data showed that there were no significant differences in the total sgRNA levels between the two sub-lineages and the parental delta, sgRNA expression was significantly lower in AY.104 for ORF3a, ORF6 and ORF7a compared to the other two lineages. All these ORFs play an important role in evading the host interferon responses by suppressing STAT1 and STAT2 phosphorylation, inhibiting STAT1 complex nuclear translocation and interacting with STING and preventing nuclear translocation of NFκβ (25)(26)(27). Therefore, increased expression of sgRNA of ORF3a, ORF6 and ORF7a by AY.28 and parental B.1.617.2 compared to AY.104, may associate with increased virulence due to suppression of host IFN responses. On the contrary, AY.104 had a higher sgRNA expression of ORF8 and the N genes compared to the other lineages. ORF8 has shown to inhibit IRF3 nuclear translocation and N protein has shown to inhibit RIG-1 signaling (25). However, these sgRNA analysis data are only suggestive of such a possibility and isolation of these viruses and further studies in vitro and in vivo would be required to draw further conclusions.
The AY.28 and AY.104 delta sub lineages that originated in Sri Lanka, spread to several countries within a few weeks. AY.28 was detected in 42 countries by now, while it has predominantly been reported in USA, Japan, India and United Kingdom, reflecting the main travel destinations of Sri Lankan individuals (13). Large divergent cluster of AY.28 seen in USA appears to be a spread of an individual case from early days of the lineage according to the most recent common ancestor (tMRCA) in late January 2021. AY.104 is currently detected in 22 countries, while again predominantly been reported in United Kingdom, India, Canada and Qatar (14). One of the main other sub lineages reported in Sri Lanka was AY.95 (8.2%), which in thought to have originated from the Maldives (28). Therefore, this sub lineage appears to have been introduced due to frequent travel between Sri Lanka and Maldives.
In conclusion, the massive outbreak due to the delta variant in 2021 was predominantly due to two delta sub lineages AY.28 and AY.104. These two Sri Lankan sub lineages accounted for over 80% of the sequenced delta variants in Sri Lanka from July to December, while AY.28 was the cumulatively predominant sub lineage. Although the A222V mutation and significantly higher sgRNA expression in certain ORFs possibly contributed to an enhanced suppression of interferon genes in AY.28 thereby giving it a fitness advantage over AY.104, similar frequencies of AY.28 and AY.104 at the later stage of the outbreak suggest that both sub lineages may have comparable transmissibility. It would be important to further investigate the relevance of the findings by isolating these viruses, in order to understand the evolution and virulence of SARS-CoV-2 variants during this pandemic.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Review Committee, University of Sri Jayewardenepura. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.