Genomic characterization of SARS-CoV-2 from an indigenous reserve in Mato Grosso do Sul, Brazil

Background The COVID-19 pandemic had a major impact on indigenous populations. Understanding the viral dynamics within this population is essential to create targeted protection measures. Methods A total of 204 SARS-CoV-2 positive samples collected between May 2020 and November 2021 from an indigenous area in Mato Grosso do Sul (MS), Midwestern Brazil, were screened. Samples were submitted to whole genome sequencing using the Nanopore sequencing platform. Clinical, demographic, and phylogenetic data were analyzed. Results We found the co-circulation of six main SARS-CoV-2 lineages in the indigenous population, with the Zeta lineage being the most prevalent (27.66%), followed by B.1.1 (an ancestral strain) (20.21%), Gamma (14.36%) and Delta (13.83%). Other lineages represent 45.74% of the total. Our phylogenetic reconstruction indicates that multiple introduction events of different SARS-CoV-2 lineages occurred in the indigenous villages in MS. The estimated indigenous population mortality rate was 1.47%. Regarding the ethnicity of our cohort, 64.82% belong to the Guarani ethnicity, while 33.16% belong to the Terena ethnicity, with a slightly higher prevalence of males (53.43%) among females. Other ethnicities represent 2.01%. We also observed that almost all patients (89.55%) presented signs and symptoms related to COVID-19, being the most prevalent cough, fever, sore throat, and headache. Discussion Our results revealed that multiple independent SARS-CoV-2 introduction events had occurred through time, probably due to indigenous mobility, since the villages studied here are close to urban areas in MS. The mortality rate was slightly below of the estimation for the state in the period studied, which we believe could be related to the small number of samples evaluated, the underreporting of cases and deaths among this population, and the inconsistency of secondary data available for this study. Conclusion In this study, we showed the circulation of multiple SARS-CoV-2 variants in this population, which should be isolated and protected as they belong to the most fragile group due to their socioeconomic and cultural disparities. We reinforce the need for constant genomic surveillance to monitor and prevent the spread of new emerging viruses and to better understand the viral dynamics in these populations, making it possible to direct specific actions.


Introduction
The Coronaviruses disease 2019 (COVID-19) pandemic, caused by a new coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread globally with unimaginable proportions, reaching populations worldwide and causing thousands of deaths (1).Among the populations affected by the pandemic, indigenous peoples were substantially impacted, with irreparable human and cultural losses.Several factors, such as cultural, social and biological aspects, have a significant impact on the transmissibility and occurrence of infectious diseases in these populations, leaving them in a situation of greater fragility and vulnerability (2)(3)(4).
The impact of SARS-CoV-2 on global public health and economies has been profound (5).To contain the virus spread globally and to reduce the epidemic to growth, public health interventions and non-pharmaceutical measurements were adopted.Social distancing, borders and travel restrictions, lockdowns, mask wearing, and contact tracing were some of the interventions that have shown effectiveness in mitigating the spread of COVID-19 (6,7).Shortly in the pandemic, in December 2020, the U.S. Food and Drug Administration (FDA) issued the first emergency use authorization for use of the Pfizer-BioNTech COVID-19 vaccine in persons aged 16 years and older for the prevention of COVID-19 (8).In Brazil, the COVID-19 vaccination calendar starter in January 2021, including indigenous population over 18 years old among the priority group (9).However, when compared to the general population, the indigenous population achieved lower vaccination coverage (4).
Since January 10, 2020, up to December 2022, Brazil has confirmed 36,960,888 COVID-19 cases and 697.894 confirmed deaths (9).Among the currently available public SARS-CoV-2 genome sequences, data on sequencing of COVID-19 cases from indigenous people were not found in the literature, evidencing the paucity of data and reinforcing the need to expand genomic surveillance in groups of ethnic minorities, underserved and isolated.
Brazil has an estimated indigenous population of 1,108,970 people (0.5% of the Brazil population -214,300,000) living in indigenous areas, subdivided into more than 300 ethnic groups that speaks more than 274 different languages (3, 10).In Mato Grosso do Sul (MS) state, Midwestern of Brazil, this population represents approximately 3% of the state's population.In MS, indigenous villages are located close to urban areas, where social and commercial relations with non-indigenous people could contribute to the spread of COVID-19 among indigenous peoples.About 65% of the indigenous population lives in the south of the state, where the city of Dourados is located (11).In this city, around the Dourados-Itaporã highway, is located the Dourados Indigenous region, the largest Brazilian peri-urban indigenous site, with a total area of 3,474,59 acres and a population of approximately 18,000 inhabitants, living in Bororó and Jaguapiru villages (12-14) (Figure 1).
Mato Grosso do Sul state ranked first in the number of confirmed cases of SARS-CoV-2 infections and second in the number of deaths in the indigenous population (3), when compared to other states in Brazil during the period of this study (15).The genetic characterization of viruses is essential not only for developing vaccine and antiviral protocols aiming for efficient treatments and diagnosis but also for monitoring and controlling disease outbreaks, following their evolution, and supporting decision-making (16).
The Brazilian natives have suffered heavily from the impacts caused by COVID-19 since the beginning of the pandemic.This research aimed to genomic characterized SARS-CoV-2 circulating strains in the indigenous population of Mato Grosso do Sul state, mainly in the Dourados Indigenous Reserve area, from May 2020 to November 2021.

Clinical and sociodemographic data
The sociodemographic and clinical data of participants in this study were extracted from data spreadsheets released by the Special Indigenous Health District (DSEI) of Mato Grosso do Sul, a decentralized management unit of the Indigenous Health Care Subsystem (SasiSUS) in Brazil.These professionals are part of the staff of the Special Indigenous Health Secretary (SESAI).Each health agent has a number of households to monitor as well as their residents.The indigenous health data collected by the agents are stored in the Information System for Indigenous Health Care (SIASI) and in the Information System of the National Immunization Program (SI-PNI), preserving the confidentially of individuals/patients.This study was approved by The National Research Ethics Committee (CONEP) with identification number 4.584.624.

Samples collection and molecular diagnostic assays
Clinical samples of indigenous patients with suspected SARS-CoV-2 infection and residing in indigenous areas of Mato Grosso do Sul state were collected from May 2020 to November 2021 for COVID-19 diagnosis and whole genome sequencing.The biological material was collected following the workflow already established in the Basic Indigenous Health Units (UBSI) for detecting SARS-CoV-2 using qRT-PCR (17).Viral RNA was extracted from nasopharyngeal swabs using QIAamp Viral RNA Mini KIT (QIAGEN), following manufacturer's instructions.The COVID-19 diagnosis was performed using one of the three protocols of qRT-PCR: (1) the Allplex 2019-nCoV assay (Seegene) targeting the envelope (E), the RNA-dependent RNA polymerase (RdRp) and the nucleocapsid (N) genes; (2) SARS-CoV2 (E/RP) assay (Bio-Manguinhos/Fiocruz) targeting the E and RP gene; and (3) BioMol oneStep/COVID-19 kit (Institute of Molecular Biology of Paraná (IBMP)) targeting ORF-1ab and N gene.All protocols were performed following the manufacturers' instructions.

Whole-genome sequencing
Positive samples were selected for sequencing based on the Ct value (≤25).The SARS-CoV-2 sequencing was performed using the Oxford Nanopore technology.Briefly, SuperScript IV Reverse Transcriptase kit (Invitrogen) was initially used for complementary DNA (cDNA) synthesis, following the manufacturer's instructions.The cDNA generated was then subjected to multiplex PCR using the Q5 High-Fidelity Hot-Start DNA Polymerase (New England Biolabs) and a set of specific primers designed by the ARTIC Network for sequencing the complete SARS-CoV-2 genome (Artic Network version 3), as previously described (18).Amplicons were purified using 1x AMPure XP beads (Beckman Coulter) and quantified on Qubit (ThermoFisher) using Qubit dsDNA HS assay kit (ThermoFisher).DNA library preparation was performed using the ligation sequencing kit LSK109 (Oxford Nanopore Technologies) and the native barcoding kit (NBD104 and NBD114, Oxford Nanopore Technologies).Sequencing libraries were loaded into an R9.4 flow cell (Oxford Nanopore Technologies).In each sequencing run, we used negative controls to prevent and check for possible contamination with less than 2% mean coverage.

Genome assembling and lineage analyses
Sequencing raw files were base called using Guppy v3.4.5 and barcode demultiplexing was performed using qcat.Consensus sequences were generated by de novo assembling using Genome Detective (19) that uses DIAMOND to identify and classify candidate viral reads in broad taxonomic units, using the viral subset of the Swissprot UniRef protein database.Candidate reads were next assigned to candidate reference sequences using NCBI blastn and aligned using AGA (Annotated Genome Aligner) and MAFFT.Final contigs and consensus sequences were available as FASTA files.
To ensure the quality of the genome sequences generated in this study and to guarantee the highest possible phylogenetic accuracy, only genomes >29,000 bp with <1% of ambiguities were considered.The genomes were submitted to Pangolin COVID-19 Lineage Assigner Tool v.3.1.142to confirm the variant classification. 1

Phylogenetic analysis
The datasets used for the phylogenetic analysis included Brazilian SARS-CoV-2 complete genomes sequences retrieved from the GISAID database. 2References genomes of each variant were added according to the GISAID initiative using the region-specific download source on the website.Nucleotide sequences were aligned using MAFFT (20) and submitted to IQ-TREE2 for maximum-likelihood (ML) phylogenetic analysis (21), employing the general time reversible model of nucleotide substitution (GTR + F + R4) according to Bayesian Information Criterion (BIC), inferred by ModelFinder application (22).Branch support was assessed using Ultrafast Bootstrap (UFBoot) 1,000 replicates.TreeTime (23) was used to transform this ML tree topology into a dated tree using a constant mean rate of 8.0 × 10 −4 nucleotide substitutions per site per year, after excluding outlier sequences.

Data availability statement
All sequences generated and used in the present study are listed in Supplementary Table 1, along with their GISAID sequence IDs, dates of sampling, and the originating.

Indigenous population and COVID-19 clinical data in an indigenous population
For the clinical and sociodemographic analysis, data from the 204 indigenous patients were obtained.When analyzing data related to the village, 55.67% were from Jaguapiru and 32.51% were from Bororó villages (Figure 1 and Table 1).Other communities added up to 11.82% (Table 1).Out of the participants, 64.82% belong to the Guarani ethnicity, while 33.16% belong to the Terena ethnicity.Other ethnicities represent 2.01% (Table 1).
We also compare gender information and found a slightly higher prevalence of males (53.43%) among females.When we stratify the information by age group, most of the indigenous participants belong to the age range between 18 and 39 years old (46.57%), followed by the ages between 40 and 60 years (31.37%).Participants under 18 years represent 14.71%, and older adult indigenous indigenous, older than 60 years, represent a total of 7.35% of the participants (Table 1).That can be a direct reflection of the community structure, that is characterized by a younger age structure due to higher birth rates and early mortality (24).
We also analyzed close contacts with suspected or confirmed COVID-19 cases and travel history of the reported cases (Table 1).Travel history of 14 days before the onset of symptoms was not reported, indicating that all cases probably occurred within the villages (Table 1).A total of 69.62% of the patients reported close contact with suspected cases, and 79.29% reported contact with a confirmed COVID-19 case (Table 1).

SARS-CoV-2 phylogenetic inferences and lineages diversity in indigenous peoples
A total of 204 positive samples for SARS-CoV-2 from the indigenous population included in this study were obtained, including samples from 3 patients that died, constituting a mortality rate of 1.47%.To understand the dynamics of SARS-CoV-2 in the Brazilian indigenous population, we analyzed clinical and sociodemographic data within phylogenetic analysis using a data set comprising available representative genomes from Brazil, including the genomes sequenced in this study.
All the 204 samples that were qRT-PCR positive for SARS-CoV-2 had a compatible Ct value (≤25) for sequencing.We retrieved 188/204 nearly completed genomes (Figure 2).Using Pangolin (see text  3 and Table 2).We performed a maximum likelihood phylogenetic analysis using a dataset of 1,034 SARS-CoV-2 nearly complete genome sequences from all five regions of Brazil, including 188 sequences obtained in this study.Our phylogenetic reconstruction indicates that multiple introduction events of different SARS-CoV-2 lineages occurred in the indigenous villages in Mato Grosso do Sul state through time.This data suggests the continuous mobility of people between the villages and urban areas in the Dourados Indigenous Reserve area.The increase in the frequency of SARS-CoV-2 variants detected over time is shown in Figure 3.

Discussion
Brazilian indigenous peoples have different responses to new diseases, raising the concern about COVID-19 in these populations (25).It is extremely important to obtain information on the incidence of different pathogens that infect indigenous peoples, which is one of the pillars of our study, where we sought to identify and report SARS-CoV-2 variants that circulated within the indigenous villages in Mato Grosso do Sul and correlate it with clinical and sociodemographic data.Our study took place mostly in the Dourados Indigenous Reserve area, which, like other indigenous reserves, is heavily impacted mainly by social inequalities and the spread of communicable diseases (26).We obtained sequences from 188 samples from indigenous peoples, identified 13 different SARS-CoV-2 lineages circulating in the indigenous population from Dourados Indigenous Reserve area, and showed multiple introductions of these lineages in the indigenous reserve area.
During the study period, Brazil reported 22.076.863cases and 614.186 deaths caused by SARS-CoV-2, and a mortality rate of 2.78%; while MS reported 378.715 confirmed cases and 9.681 deaths, and a mortality rate in the state of 2.56%.According to data from the Ministry of Health, Special Secretary for Indigenous Health (27), from the start of the pandemic in Brazil (March 2020) until February 2023, the number of confirmed cases of COVID-19 among the country's indigenous peoples was 55,821, and 839 deaths, with a national indigenous mortality rate of 1.50%.The state of Mato Grosso do Sul ranked first in the number of confirmed cases and deaths among indigenous peoples, recording 6.633 cases and 164 deaths (2.47% of mortality rate) (27).In our study, we recorded 1.47% mortality rate (3/204).Although other studies have showed that indigenous people could present a higher mortality rate when compared to non-indigenous population for other infectious diseases (28) we found a slightly lower mortality rate in our population, and it could be due to limitations of this study.The difficulty of accessing indigenous villages and the customs of often not seeking medical care could have contributed to less diagnosis testing in this population, contributing to the underreporting number of COVID-19 cases and deaths, which could also contribute to lower mortality rates (27).The low age structure reported in this population could be a protection for them once it is known that youth tend to have a better response to infection and, consequently, a lower mortality rate when compared to the older adult indigenous population (4).Even though we found a slightly    lower mortality rate in our cohort; we should not make this comparison without accounting for the differences cited above.Despite the genomic surveillance efforts in Brazil, with cases being sequenced as soon as the first confirmed infections were detected in Brazil, there is still a paucity of genomic data from indigenous populations.According to data obtained from GISAID 3 , during period of this study, from January 2020 to March 2023, the number of sequenced and public SARS-CoV-2 sequences was 177,575 out of a total of 36.960.888confirmed cases of COVID-19 in Brazil.These data show that only 0.515% of positive cases from the general population have been sequenced and shared in Brazil, and none of those are from the indigenous population.In this study, we obtained 188 nearly complete genome sequences, representing 2.83% of COVID-19 cases in this population in Mato Grosso do Sul (204/5,018), a higher percentage if we consider total of SARS-CoV-2 sequences among the general population in Brazil.
The difficulties that place indigenous peoples in a state of more fragility, such as the lack of basic sanitation, combined with their typical customs, such as housing commonly shared by many indigenous people, the mutual sharing of general-purpose artifacts and personal objects, contribute significantly to viral infection and spread of multiple strains of SARS-CoV-2 in these peoples (29).It is also important to note that indigenous peoples usually face a high incidence of infectious diseases such as malaria and tuberculosis, and vaccine-preventable diseases (28).We noticed during the period of this study a continuous circulation of multiple variants of SARS-CoV-2, possibly because of multiple introduction events and spread of the virus, facilitated by the indigenous' mobility.Indigenous mobility is more frequent in the Jaguapiru village because it is closer to the city of Dourados, maintaining a continuous circulation among the village and the city, mainly of younger men that are looking for a job or donation in the main city for their subsistence.This fact may have contributed to the fact that most patients that reported having close contact with a suspected or confirmed case were from this village, significantly contributing to the progression and dissemination of the disease in the village.
The pandemic has brought to light several vulnerabilities that indigenous communities face, so when the question of new variants of COVID-19 is brought up we need to increase our attention to this population and how the new scenario will pose a threat to their health and well-being, as they often have limited immunity to diseases introduced from the outside world (28).In this study we identified 13 different lineages circulating among the indigenous population of MS from June 2020 to November 2021.Among those variants, the B.1.1 variant had a greater predominance over the others.Starting in December 2020, the Zeta lineage was the predominant lineage, until March 2021.The Zeta lineage was identified for the first time in October 2020 in Rio de Janeiro state and is described as having the S:E484K mutation, which confers the ability to evade neutralizing antibody responses and was related to reinfection cases (23,24).Although this variant does not contain mutations that increase its transmissibility potential, it quickly spread across the country (25).In less than 2 months, it was detected in indigenous villages in the countryside of Mato Grosso do Sul state.In our study, the Gamma variant was first detected in March 2021, 3 www.gisaid.org 2 months after it had been first detected in Brazil, with subsequent predominance over the Zeta variant, remaining predominant until October 2021, when the Delta variant was first detected in this population and took over the predominance of COVID-19 cases in those villages (25).Among the 13 lineages described here, eight (Zeta, Gamma, Delta, B. were present in the Jaguapiru and Bororó villages, located in the Dourados Indigenous Reserve area (Figure 1).
Our results corroborate other studies that describe the same pattern of lineage circulation was also observed in the general population in Brazil (30,31), also showing a sustained transmission among indigenous peoples.We emphasize that Dourados is the largest city in the countryside of the state and is known as being a connecting city in the route for commerce and agriculture, linking the south of MS state to the rest of Brazil.The virus spread rapidly among indigenous communities, leading to high infection rates and devastating consequences.Other studies have showed disparities and limitations of indigenous population to face COVID-19, when compared to the general population (4,(32)(33)(34), corroborating the vulnerability of a population that should be protected.
The COVID-19 pandemic has highlighted the existing inequalities and vulnerabilities faced by indigenous peoples in Brazil.Hence, there is a need to evaluate the strategies to improve the health outcomes of the indigenous population.The comprehensive and culturally sensitive health policies that address the challenges faced by these communities, not only during sanitary crises, are required.
This study presents some limitations, such as the inconsistency or absence of secondary data in public spreadsheets; the scarcity of data on indigenous peoples; the scarcity of sequencing in the literature; the low number of positive samples collected; and the possibility of underreporting of cases and deaths caused by COVID-19 in the targeted population.
The real-time update feature of this platform during the pandemic period posed a limitation as it frequently did not synchronize simultaneously with case notification numbers.Consenquently, numerous patients were left with incomplete data regarding their symptom onset dates, vaccination details, and vaccine coverage.Nevertheless, we addressed this issue by excluding cases with incomplete information from our study.

Conclusion
In this study, we showed the circulation of multiple SARS-CoV-2 variants in the indigenous population, which should be isolated and protected as they belong to the most fragile group due to their socioeconomic and cultural disparities.Unfortunately, for indigenous peoples, this means an even worse loss since it results in irreparable human and cultural losses.We highlight the inefficiency of protective measures to protect this population from COVID-19 infection and reinforce the importance of paying attention to the less favored and more exposed populations so that public policy decision-makers could quickly and effectively respond in situations that put the Indigenous peoples at risk.We reinforce the need for constant genomic surveillance to monitor and prevent the spread of new emerging viruses and to better understand the viral dynamics in these populations, making it possible to direct specific actions.

FIGURE 1
FIGURE 1Geographical distribution of the villages covered by the study.

FIGURE 2 Time
FIGURE 2Time-resolved maximum-likelihood tree of n=188 newly SARS-CoV-2 whole genome sequences generated from this study in addition with n=846 reference strains collected at GISAID up to February 26th, 2023.Newly genomes are colored according to their lineage assignment.

TABLE 1
Demographic and clinical information from indigenous populations participating in this study.