Overview of the SARS-CoV-2 genotypes circulating in Latin America during 2021

Latin America is one of the regions in which the COVID-19 pandemic has a stronger impact, with more than 72 million reported infections and 1.6 million deaths until June 2022. Since this region is ecologically diverse and is affected by enormous social inequalities, efforts to identify genomic patterns of the circulating SARS-CoV-2 genotypes are necessary for the suitable management of the pandemic. To contribute to the genomic surveillance of the SARS-CoV-2 in Latin America, we extended the number of SARS-CoV-2 genomes available from the region by sequencing and analyzing the viral genome from COVID-19 patients from seven countries (Argentina, Brazil, Costa Rica, Colombia, Mexico, Bolivia, and Peru). Subsequently, we analyzed the genomes circulating mainly during 2021 including records from GISAID database from Latin America. A total of 1,534 genome sequences were generated from seven countries, demonstrating the laboratory and bioinformatics capabilities for genomic surveillance of pathogens that have been developed locally. For Latin America, patterns regarding several variants associated with multiple re-introductions, a relatively low percentage of sequenced samples, as well as an increment in the mutation frequency since the beginning of the pandemic, are in line with worldwide data. Besides, some variants of concern (VOC) and variants of interest (VOI) such as Gamma, Mu and Lambda, and at least 83 other lineages have predominated locally with a country-specific enrichments. This work has contributed to the understanding of the dynamics of the pandemic in Latin America as part of the local and international efforts to achieve timely genomic surveillance of SARS-CoV-2.


Introduction
In December 2019, several cases of a new respiratory illness were described in Wuhan, China. About a month later, it was confirmed that the illness COVID-19 (coronavirus disease 2019) was caused by a novel coronavirus which was subsequently named SARS-CoV-2 (1, 2). Until June 2022, the COVID-19 pandemic had impacted the world with >549 million confirmed cases of COVID-19, including >6.3 million deaths. Latin America was one of the most strongly impacted regions with more than 72 million reported infections and >1.6 million deaths during the same period.
SARS-CoV-2 genome sequences have been reported from many regions of the world and these data have been proven useful in tracking the global spread of the virus. Genomic epidemiology of SARS-CoV-2 has shed light on the origins of regional outbreaks, global dispersal, and epidemiological history of the virus (3,4). Until April 2022, over 11.5 million genomes had been deposited in the GISAID database (https://www.gisaid.org/), out of which >376,000 were reported by Latin American countries.
Since its appearance, a large genetic diversity has been recognized for SARS-CoV-2 due to widespread transmission and geographical isolation (5). The emergence of new genotypes (lineages, clades, variants, etc.) is the product of a natural process that occurs when viruses replicate at high rates as it happens during a pandemic (4). The World Health Organization (WHO) has classified five divergent genotypes as variants of concern (VOC: Alpha, Beta, Gamma, Delta, Omicron), as well as some lineages into variants of interest (VOI: Lambda, Mu, Epsilon, Zeta, Theta, Iota, Eta, Kappa, and others) and variants under monitoring (VUM: B.1.640 and XD) (6). All reported variants and other lineages have been identified in Latin America (7), including genotypes that were first reported regionally, such as Gamma in Brazil, Mu in Colombia, and Lambda in Peru (6), as well as unique lineages in Costa Rica and Central America (8,9). Those descriptions of locally enriched genotypes exemplify the opportunities that SARS-CoV-2 has found in Latin America for spreading and evolving. This scenario is in part explained by the complex environmental and human reality in this region, with huge ecological diversity and social inequalities (10, 11). Thus, efforts on revealing the behavior of SARS-CoV-2 are necessary to identify regionally emerging patterns for the suitable management of the pandemic, which cannot be inferred from North America, Europe, or Asia (11).
In this context, the CABANA initiative (Capacity building for Bioinformatics in Latin America, Global Challenges Research Fund GCRF: www.cabana.online) supported the development of a regional project titled "The SARS-CoV-2 genome, its evolution and epidemiology in Latin America" during 2021. The project had the direct participation of seven institutions from Argentina, Brazil, Bolivia, Colombia, Costa Rica, Mexico, and Peru. Efforts of this project included not only the sequencing and genome assembly of the SARS-CoV-2 virus from a total of 1,534 COVID-19 cases in those countries, but also to bring a more complete overview of the SARS-CoV-2 genotypes circulating in Latin America during 2021 using public databases. Thus, this study aimed to contribute to the genomic surveillance of the SARS-CoV-2 to understand the dynamics of the pandemic in Latin America by providing genome sequences and analyzing circulating genotypes during the year 2021.

Samples and ethical considerations
Respiratory samples were obtained from public and private laboratories belonging to the national network of SARS-CoV-2 diagnostics in each country. Adequate transportation and storage conditions were guaranteed to preserve the samples. Every sample was anonymized to protect patients' identity. Being a notifiable disease, the metadata was collected from the forms that accompanied the samples, either in the national reference laboratories or in the ministries of health. See Supplementary material for IDs to access metadata in the GISAID database.

Sample sequencing and genome analysis
To contribute with SARS-CoV-2 genome sequences from Latin America, seven participant countries (Argentina, Bolivia, Brazil, Costa Rica, Colombia, Mexico, and Peru) were involved in sample processing from COVID-19 patients. Diagnosis using RT-qPCR, genome sequencing and assembly, as well as genotyping, were implemented using the laboratory protocols and bioinformatic pipelines that are being locally used as part of the genomic surveillance efforts in each country as shown in Table 1 and reported in (9,12,13). Genome sequences were uploaded to the GISAID database (https://www.gisaid.org/). Details regarding the number of processed samples (assembled genomes), laboratory . /fpubh. . and bioinformatic protocols for each country are summarized in Table 1. GISAID accession numbers (ID) for assembled genomes are presented in Supplementary material.

Analysis of circulating SARS-CoVgenotypes in Latin America
To gain insights into the SARS-CoV-2 genotypes circulating in Latin America during 2021, a general analysis was done using the genome sequences available at the GISAID database (https://www. gisaid.org/). Selection of countries, statistics of sequenced samples, and plots of circulating genomes and mutation frequency were obtained using the tools of the GISAID platform. The number of COVID-19 cases per country was retrieved from the daily reports of the Pan American Health Organization (14). All analyses were performed considering sequences collected until January 31th, 2022. PANGOLIN lineage database (15, 16) was used to analyze the frequency of lineages among countries.

Results and discussion
Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time (4).
In this work, we extended the repertoire of SARS-CoV-2 genome sequences with a total of 1,534 sequences from seven Latin American countries (Table 1). Whereas, this was a relatively modest contribution to the overall quantity of sequences produced in this period in Latin America for certain time-intervals and countries it provided important complementarity for the genomic surveillance of the virus. In Bolivia for example, our efforts represented 38% of all sequences produced over this time. To perform a more complete examination, we included all sequences from Latin America available at the GISAID database collected up to January 2022. A total of 221,228 genomes sequences, including the 1,534 provided by this work, were analyzed by genotype and the mutation profile.
According to the GISAID database records, the numbers of sequences is still small in comparison to the number of diagnosed cases in Latin America (Table 2). On average, only 0.39% of COVID-19 cases in Latin America had been sequenced, with Mexico and Chile having the highest rates with 0.98 and 0.92%, respectively. In the case of Nicaragua, in which the pandemic has been downplayed (17,18), the reports of diagnosed patients and other statistics are considered unrealistic, including the 2.92% of sequenced samples. Thus, we did not conduct comparisons of Nicaragua among other countries due to the extremely biased data.
On the other extreme, Bolivia, Honduras and Venezuela have barely sequenced even 0.03% of samples derived from all patients diagnosed with the disease. There is no single Latin American country that has sequenced more samples, relative to the number  of cases reported, than the world average that corresponds to 2.04%, which is low too. The current scenario is congruent with a previous report with <0.5% of sequenced samples for Latin American countries (19). These findings represent not only part of the regional disparities in the SARS-CoV-2 genomic surveillance efforts in Latin America, but also that this geographic region needs to increase the effort to achieve the sequencing of at least 5% of positive samples to detect emerging viral lineages when their prevalence is <1% of all strains in a population, as suggested previously (20). In fact, globally, only 6.8% of 189 countries around the world reached this value (19). This situation is like that of other latitudes around the world in which only a very small portion of the countries has reached the recommended percentage, suggesting that sequencing at least 0.5% of the cases, with a time in days between sample collection and genome submission <21 days, could be a benchmark for SARS-CoV-2 genomic surveillance efforts for low-and middle-income countries (19) taking into account the high cost of sequencing reagents and equipment in these countries. In high income countries, around 25% of the genomes were submitted within 21 days, contrasting with the pattern observed in 5% of the genomes from low-and middleincome countries. Thus, the identification of patterns regarding the circulating genotypes in Latin America should be interpreted with cautions due the differences of SARS-CoV-2 surveillance systems, including sequencing capacity and sampling strategies between countries in the region. Regarding the circulating genotypes, the reports on the diversity of lineages are similar to other studies in Latin America (9, 21-23) and other distant geographic regions (24, 25). For divergent SARS-CoV-2 genomes, all VOCs have been reported in all Latin American countries, resulting in a large diversity of genotypes circulating in each country (Figure 1). This is in line with the expected pattern of multiple and independent reintroductions due to population mobility within Latin America, as well as to and from other countries and continents (26-28). Besides, some genotypes have been reported with an epicenter in Latin America. As presented in Figure 2  in this region were minimal for the worldwide representation ( Figure 2).
In Latin America recurrent dissemination of SARS-CoV-2 through shared borders between countries has been evidenced (30), allowing rapid entrance and dissemination of different lineages to the different countries (31). Territories with no restriction to international interchange are more likely to introduce multiple SARS-CoV-2 variants, including variants of concern and/or interest and even lineages with mutations of concern and emerging variants with different mutation patterns (32). These introductions of VOCs to Latin America were more evident during the second half of the year 2021, where the Delta variant displaced other variants in several countries and became predominant as shown in Figure 2, while during the first semester of the year lineage predominance varied among these countries.
Although several epidemiological aspects can be associated with these patterns, the extensive opening of the borders during the middle of 2021 possibly favored the spread of new variants of concern in the region. Besides, the presence of multiple mutations that have been associated with increased infectivity and/or escape from immune response in variants such as Delta (33) helped this variant to displace other variants, as it occurred worldwide.
For other genotypes, at least 83 out of >1,500 PANGOLIN lineages have been reported with a high predominance in a Latin American country ( Table 3). The full list of lineages is presented in the Supplementary material. As an example, lineage C.39 was predominant in Chile with 45.0% of all the sequences reported, followed by France (15.0%), Peru (12.0%), Guinea (10.0%), and Germany (8.0%). From these lineages, at least 80% of the sequences from 51 lineages have been reported to come from a Latin American country (Tables 3, 4). In the distribution by country, Brazil, Peru, Chile, Costa Rica, and Mexico have more reports of lineages with a frequency >80% locally.
For instance, Peru had 97.0% of the sequences reported for lineage C. 40 (34,35). Lineages derived from the Gamma variant were also reported frequently in Brazil (e.g., P.1.4, P.1.7, P.4, and others). During 2020 the lineage B.1.1.389, which harbors the specific mutation spike:T1117, was reported as predominant in Costa Rica (86% of cases of this lineage were reported in this country) (9). Despite its dominance, few changes were predicted on the virus behavior (transmission, immune response, and other) and it was quickly replaced by the lineage Central America and subsequently by VOCs such as Alpha and Gamma (8   Jointly, these results indicate that specific mutations and the subsequent consolidation into lineages were detected in Latin America and evidenced by genomic surveillance in the region. Interestingly, 17 of these lineages were first reported in a different country from where it was subsequently found to be predominant (>80%). This includes neighboring countries, such as the case of lineage P.1.7.1 which was enriched in Peru but was first reported in Bolivia. This pattern was more frequent for Brazil, with eight lineages that were first reported in other countries including from Europe and Asia, but that became dominant in this country.
Tracking of specific mutations into Latin American lineages that could be used as local markers, may help to identify transmission networks locally and globally, highlighting the need for each country and territory to strengthen the sequencing and bioinformatic capacities. These capacities can also be of use to locally study other scenarios such as clinical profiles for COVID-19 patients (37), immune escape (38), long-term COVID-19 (39), identification of co-infections (40)    recombinant genomes (a recognized mechanism of viral diversity in coronaviruses).
Despite the reports of differences in the enriched genotypes in the first half of 2021, the emergence of new variants of the viral genome in Latin America was consistent with the rest of the world inferred from the mutation frequency ( Figure 3A). During 2020, the mutation frequency for the S1 region of the spike gene was Following the gradual reopening of borders and worldwide travels, the frequency of infections and the appearance of mutations and new genotypes are expected to increase (45). Thus, more genome sequencing studies, including robust metadata collection, and more financial support are needed to continue with the surveillance of the pandemic in Latin America.
Finally, since most countries in this region are considered low-and middle-income countries, the impact of the COVID-19 pandemic on society has been devastating socially and economically (46). Genomic surveillance is pivotal as a powerful tool for decision-makers regarding the management of the pandemic in the Latin American context concerning social and economic measures, as well as practical decisions in terms of the diagnostic tools, treatments, and vaccines (4). On the other hand, local and prompt reports of emerging genotypes demonstrated the laboratory and bioinformatic capabilities in Latin American countries. These capabilities were developed locally in the last years for the surveillance of pathogens and other applications. Jointly, the local and international efforts to achieve the genomic surveillance of SARS-CoV-2 have contributed to the understanding of the dynamics of the pandemic in Latin America, which is an ongoing process.
In addition, the infrastructure related to molecular diagnostic techniques experimented a relevant advance due to the pandemic. Before the pandemic outbreak, these techniques were only available in advanced clinical laboratories but now an expanded availability and a cost-effective implementation are found in most clinical laboratories toward-becoming routine tests to study other pathogens and diseases (47).
Regarding limitations, the main drawback of this study is that we assumed that all sequences were comparable, with no segregation by experimental or bioinformatic conditions. The GISAID platform accepts a variety of conditions to upload genome sequences without restriction associated with the sample processing strategy, sequencing technology, genome assembler, variant callers and others, which were not considered here to assess their impact on the genotyping. Although previous reports have found differences in the used pipelines (48), we made the analysis using the whole set of available sequences as performed in other studies (19,49,50). Also, as an infectious disease, the clinical outcome of COVID-19 depends on the epidemiological triad: (i) environmental conditions (social behavior, restriction measurements, management of cases, others), (ii) host factors (ethnicity, risk factors, genetic profile of HLA or ACE-II alleles, others), and (iii) the virus (genotype and mutations that impact transmission, immune response, others). Here we have only considered the SARS-CoV-2 genotypes in the period but other

Conclusions
In conclusion, with this study we have contributed to the genomic surveillance of the SARS-CoV-2 in Latin America by providing 1,534 genome sequences from seven countries and the subsequent global analysis of circulating genomes mainly during 2021. For Latin America, patterns regarding several variants associated with multiple re-introductions, a relatively low proportion of sequenced samples, as well as an increase in the mutation frequency, are in line with worldwide data. Additionally, some genotypes such as Gamma, Mu and Lambda variants and 83 lineages have emerged locally with a subsequent country-specific predominance. Regional efforts demonstrate the laboratory and bioinformatics capabilities for the genomic surveillance of pathogens that have been developed in Latin America, and which is expected to continue during the current COVID-19 pandemic.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories

Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.