Seroprevalence, Prevalence, and Genomic Surveillance: Monitoring the Initial Phases of the SARS-CoV-2 Pandemic in Betim, Brazil

The COVID-19 pandemic has created an unprecedented need for epidemiological monitoring using diverse strategies. We conducted a project combining prevalence, seroprevalence, and genomic surveillance approaches to describe the initial pandemic stages in Betim City, Brazil. We collected 3239 subjects in a population-based age-, sex- and neighborhood-stratified, household, prospective; cross-sectional study divided into three surveys 21 days apart sampling the same geographical area. In the first survey, overall prevalence (participants positive in serological or molecular tests) reached 0.46% (90% CI 0.12–0.80%), followed by 2.69% (90% CI 1.88–3.49%) in the second survey and 6.67% (90% CI 5.42–7.92%) in the third. The underreporting reached 11, 19.6, and 20.4 times in each survey. We observed increased odds to test positive in females compared to males (OR 1.88 95% CI 1.25–2.82), while the single best predictor for positivity was ageusia/anosmia (OR 8.12, 95% CI 4.72–13.98). Thirty-five SARS-CoV-2 genomes were sequenced, of which 18 were classified as lineage B.1.1.28, while 17 were B.1.1.33. Multiple independent viral introductions were observed. Integration of multiple epidemiological strategies was able to adequately describe COVID-19 dispersion in the city. Presented results have helped local government authorities to guide pandemic management.


INTRODUCTION
Since its emergence in December 2019, the new human coronavirus has had a tremendous impact on humanity due to the pandemic nature of its infection, called COVID-19 (Zhou et al., 2020). The SARS-CoV-2 pathogen was described on January 24, 2020. In Brazil, the first case of COVID-19 was reported on February 26, 2020, in the city of São Paulo (Araujo et al., 2020). The virus spread rapidly, and the country had the highest number of cases and deaths in Latin America, experiencing its first peak wave in late July 2020. Although most patients were identified in the most prominent Brazilian cities, São Paulo and Rio de Janeiro, dispersion to other municipalities was quickly reported. Betim, a town located in the Minas Gerais State in Brazil with an estimated population of 439,340 in 2019, had its first reported SARS-CoV-2 case on March 23, 2020, in two patients returning from Europe. Two months later, on May 23, 2020, only 73 confirmed cases had been reported, although 4380 suspected cases were identified in public databases indicating limited testing availability. Brazilian public healthcare system has prioritized testing subjects with symptoms due to scarce diagnostic tests, particularly in the early days of the pandemic. Since data suggest that symptomatic cases represent a fraction of persons infected with SARS-CoV-2, official statistics were expected to be underestimated (Wu et al., 2020). Several aspects may influence COVID-19 symptom presentation Rossi et al., 2021). Epidemiological surveillance using prevalence studies is needed to evaluate the true extent of SARS-CoV-2 dispersion, significantly extending testing to asymptomatic subjects. Combining serological and molecular tests may be a more robust strategy to uncover viral diffusion in a territory, avoiding each test's kinetic detection limitations. Valid prevalence and seroprevalence estimates for a population rely on two major factors: (i) a representative population sample and (ii) accurate diagnostic testing (Byambasuren et al., 2021).
While the epidemiological investigation is essential for controlling COVID-19, genomic surveillance is crucial. Robust SARS-CoV-2 variant monitoring can track viral evolution, detect new variants, describe patterns and clusters of transmission, outbreak tracking, among others. Therefore, it can provide actionable information on implementing a more targeted public health strategy that addresses local priorities through stakeholder engagement and mitigation efforts (Robishaw et al., 2021). We conducted a study combining seroprevalence, prevalence, and genomic surveillance approaches to understand the SARS-CoV-2 epidemic spread in Betim city.

Seroprevalence and Prevalence
The Research Ethics Committee approved the present experiment under protocol CAAE 31459220.2.0000.5651. We conducted a population-based age-, sex-and neighborhoodstratified, household, prospective; cross-sectional study repeated every 21 days in the same geographic area to determine the extent of SARS-CoV-2 transmission in Betim, Minas Gerais, Brazil (Figures 1A,B). All populated areas in the city were sampled. Three surveys were held: June 3-5, June 23-25, and July 13-15, 2020. The sample size (n = 1,080 each survey) was estimated considering dichotomous outcome (positive or negative), the population of 439,340 inhabitants, the confidence level of 90%, the maximum margin of error of 2.5%, and lack of a priori information on the prevalence of SARS-COV-2 in the municipality's population (the latter represented by p = q = 0.5) and using the equation below: Random sampling was employed to ensure representativeness of the population, stratified by sex, age (0-5; 6-19; 20-39; 40-59, and 60 years or older) and city neighborhoods (Centro, Alterosas, Imbiruçu, Norte, Teresópolis, PTB, Citrolândia, Vianópolis, Icaivera, and Petrovale). Every census tract (population stratum created by Governmental agencies) was sampled with at least one address. In case of refusal or closed households, the closest home was selected. Thirty-six teams (one driver, one nurse, and one community health worker) worked on active sampling subjects in 1080 addresses during 3 days. Clinical and epidemiological data were obtained using a questionnaire during interviews with participants or their legal guardians who signed the Informed Consent. Biological samples were collected using a nasal swab to conduct RT-PCR and capillary blood obtained by fingerstick for the serological test.
Associations of each variable of interest with surveys and positive status were assessed using chi-square tests. Odds ratios were estimated using logistic regression with the glm function. Spatial geostatistical modeling and prediction were carried out using the gstat and predict functions from the gstat package. All analyses were carried out in R software (version 4.1.1).

Genomic Surveillance
Whole viral genome amplification and DNA library preparation was carried out as described elsewhere (Moreira et al., 2021). Briefly, QIAseq SARS-CoV-2 Primer Panel-QIAGEN kit was used to amplify positive samples, following manufacturer instructions. In total, 39 of the 84 detectable samples were eligible for library preparation based on their CTs ≤ 30. Library concentration was measured using the QIAseq Library Quant Assay-QIAGEN kit, and the fragment integrity and size were evaluated using Bioanalyzer (Agilent Technologies, Waldbronn, DE). Sequencing was carried out on a MiSeq (Illumina, San Diego, CA, United States).
The raw data generated were filtered by Trimmomatic v0.39 (Bolger et al., 2014), which trimmed low-quality bases (Phred score < 30) and removed short reads (< 50 nucleotides) as well as adapters and primer sequences. Reads were then mapped against the SARS-CoV-2 reference genome (accession number: NC_045512.2) with Bowtie2 (Langmead et al., 2009). The resulting BAM files were manipulated with SAMtools, BCFtools (Li et al., 2009), and BEDtools (Quinlan and Hall, 2010) to generate consensus genome sequences. Bases with less than 10x sequencing depth were masked. In total, 35 of the 39 genome sequences presented coverage greater than 79% and average sequencing depth greater than 200x (Supplementary Table 1). The 35 consensus genome sequences were submitted to the PANGOLIN 2.0 lineage classification tool (database version February 2, 2021) (Rambaut et al., 2020).
To confirm the PANGOLIN identification and further contextualize the diversity of lineages circulating in Betim, we performed a set of phylogenetic analyses. First, a global dataset was assembled from a subset of high-quality data available on GISAID and the newly generated genomes (n = 3,814). This dataset contained all Brazilian sequences and one per week for each country, as available on GISAID until January 12, 2021. These sequences were aligned with MAFFT v7.475 (Katoh and Standley, 2013), and a maximum likelihood tree was inferred on IQ-Tree 2 (Minh et al., 2020), under the GTR+F+I+G4 model (Tavaré, 1986;Yang, 1994). Shimoidara-Hasegawa approximate likelihood ratio test (SH-aLRT) was used to assess branches' statistical support (Guindon et al., 2010).
Maximum likelihood trees were inferred from these datasets, and their temporal signal was evaluated with tempest v1.5.3 (Rambaut et al., 2016). Time scaled phylogenies were then inferred from these datasets with BEAST v1.10.4 , using: (i) the HKY+I+G4 nucleotide substitution model (Yang, 1994), (ii) the strict molecular clock model, (iii) the nonparametric coalescent skygrid tree prior (Gill et al., 2013) and (iv) a symmetric discrete phylogeographic model (Lemey et al., 2009). A normal prior distribution (mean = 1.13 × 10 −3 ; std = 5.1 × 10 −4 ) on clock rate was assumed, based on a previous estimate . The cutoff values of the skygrid tree prior were set based on the previously estimated dates for the emergence of each lineage . The number of grids of the tree priors was set to match the approximate number of weeks comprehended between the estimated dates for lineages' emergence and the dates of the most recently sampled sequences (41 weeks, both datasets). Two and three independent chains of 200 million generations sampling every 10,000 states were performed for datasets B.1.1.33 and B.1.1.28, respectively. Tracer v1.7.1  was used to verify mixing and convergence of chains (effective sample size > 200 for all parameters), which were then combined with logcombiner v1.10.4 after 10% burning removal. Maximum clade credibility trees were generated with treeannotator v1.10.4. All logs and trees are available in https://github.com/LBI-lab/SARS-CoV-2_ phylogenies.git.

Seroprevalence and Prevalence
Evaluation of clinical and epidemiological data showed no significant difference for the presence of any prior health condition across surveys (pneumopathy, chronic neurological disease, pregnant, postpartum, chronic cardiovascular disease, chronic kidney disease, obesity, asthma, immunodepression, chronic liver disease, diabetes, hypertension, transplanted, cancer or any comorbidity) indicating proper sampling was conducted since there was no reason to find significant differences in the period (Table 1). Four symptoms (cough, sore throat, myalgia, and rhinorrhea) and contact with a symptomatic person increased while international travel decreased. Prevalence and seroprevalence increased across surveys.
Sampling was conducted in the early stages of the pandemic, as seen in the number of absolute reported cases (Figure 2A). Cumulative confirmed cases were underestimated ( Figure 2B). In the first survey, overall prevalence (participants positive in serological or molecular tests) reached 0.46% (90% CI 0.12-0.80%), followed by 2.69% (90% CI 1.88-3.49%) in the second survey and 6.67% (90% CI 5.42-7.92%) in the third. The underreporting was obtained by the difference between survey prevalence and official data, and its magnitude reached 11, 19.6, and 20.4 times ( Figure 2B). Overall prevalence increase was observed across most administrative regions (Figures 2C,D). Active transmission areas (RT-PCR positive participants) were observed increasing across time (Figures 3A-C). By the third survey, almost all populated city areas were likely to have viral circulation ( Figure 3C).
We have also evaluated whether clinical and epidemiological variables were associated with molecular or serological test positivity ( Table 2). Several significant results were observed, mostly with reported symptoms (fever, cough, sore throat, dyspnoea, myalgia, rhinorrhea, respiratory discomfort, nausea/vomit, headache, prostration, ageusia/ anosmia). We also observed increased odds to test positive in females compared to males (OR 1.88 95% CI 1.25-2.82) and clear enrichment of positive cases in certain city regions (e.g., Imbiruçu and Terezópolis). Surprisingly, people with obesity were more likely to be positive (OR 3.33, 95% CI 1.68-6.59). The single best predictor for positivity was ageusia/anosmia (OR 8.12,). Non-significant associations were also found (Supplementary Table 2).

Genomic Viral Surveillance
In total, 35 novel SARS-CoV-2 genome sequences were obtained (GISAID EPI_ISL_5416087-5416121). The sequences were classified by PANGOLIN 2.0 to assess the genetic diversity of SARS-CoV-2 circulating in Betim. 18 of the 35 genomes were classified as lineage B.1.1.28, while 17 were B.1.1.33 (Probability = 1.0). Further, a maximum likelihood tree was inferred from the global dataset GISAID (Shu and McCauley, 2017). No difference in the dispersion pattern was observed across lineages.
The analysis supported these results, revealing sequences from the Betim cluster within several clades of these lineages confirming the circulation of (B.1.1.28 and B.1.1.33 during the first wave of COVID-19 pandemics in the city (Figure 4). The spread of Betim sequences across the tree suggests multiple independent introductions occurred in the town. Further, eight clades majorly composed by Betim sequences were inferred with variable degrees of statistical support (median SH-aLRT = 82.75, range: 0-100), suggesting the occurrence of local transmission in the city after initial introduction events. In addition to these clusters, nine introductions supported by single sequences have also been detected. Most Betim sequences or clusters are closely related to sequences from Rio de Janeiro and São Paulo, two neighboring States connected by highways to Minas Gerais. To formally assess the dynamics of introduction and spread of SARS-CoV-2 in Betim, separated datasets for lineages B.1.1.28 and B.1.1.33 were evaluated. Regression between sampling times and genetic distances revealed both datasets had moderate temporal signal (B.1.1.28: R 2 = 0.49; B.1.1.33: R 2 = 0.58), justifying molecular clock analysis.
The time-scaled phylogeographic analysis performed with dataset B.1.1.28 suggests this lineage emerged on February 22, 2020, in São Paulo (95% highest posterior density, HPD: February 11, 2020-March 05, 2020; geographic model posterior probability, PP = 0.91), later spreading to other Brazilian states ( Figure 5A). The phylogeny reveals that two introduction events, dated between April 19, 2020 (95% HPD: April 17, 2020-May 11, 2020) and April 22, 2020 (95% HPD: April 20, 2020-May 27, 2020), led to the emergence of Betim clusters (harboring between two and six sequences). Additionally, four introductions related to single sequences have been detected. The phylogeographic model suggests that three introductions occurred from another Brazilian region to Betim, in addition to other single events from RJ, another one from SP, and another from foreign sequences. All events presented high statistical support (PP > 92% for all events).

DISCUSSION
Betim is a medium-sized Brazilian city (439,340 inhabitants, 343 thousand square kilometers) crossed by national roads connecting major Brazilian cities and serving as a local hub for the Brazilian Public Health System. Understanding its pandemic dynamic may provide relevant information for municipalities with similar features. Here, we estimated the overall prevalence of active infections, seroprevalence and conducted genomic surveillance before the first pandemic wave in Betim. Brazilian molecular diagnostic capacity was insufficient in the first months of the pandemic (Grotto et al., 2020). Therefore, COVID-19 cases may have been included in the official statistics as severe acute respiratory infection cases with unknown etiology. Data until May 2020 indicated a positive association between higher per-capita income and molecular COVID-19 diagnosis, while the severe acute respiratory infection cases with unknown etiology were associated with lower per-capita income, suggesting a possible diagnosis bias related to economic status . Inadequate diagnosis availability may lead to underreporting (Kupek, 2021). Our data estimated underreporting rates up to 20 times.
No studies have been conducted in Brazil evaluating active infection prevalence using adequate sampling. Our study design was inspired by previous research conducted in Santa Clara, United States, using pooled samples (Hogan et al., 2020). Pooled PCR tests were initially suggested to be used in asymptomatic people (Lohse et al., 2020) and later were recommended for surveillance studies in populations with low infection prevalence (Mutesa et al., 2021). Seroprevalence studies were conducted during the first wave in Brazil that peaked in July 2020. Two of the highest city seroprevalences reported during the period were Boa Vista (25.4% in June 2020) (Hallal et al., 2020) and São Luiz (40.4% between the end of July and August 2020) (da Silva et al., 2020), both in the northern area of the country. A nationwide survey carried out in May and June 2020 presented seroprevalence lower than 2% during both surveys in all sampled cities neighboring Betim (less than 200 km), corroborating our findings (Hallal et al., 2020). Furthermore, seroprevalences higher than 10% were solely found in towns in the North Region (Hallal et al., 2020). In December 2020, Manaus, the largest city in the North Region, experienced a resurgence of COVID-19 (Sabino et al., 2021) despite high seroprevalence , likely due to the gamma variant (Faria et al., 2021).
Previous seroprevalence studies have indicated ethnic and socioeconomic bias for SARS-CoV-2 infection in Brazil since the pandemic's beginning (Amorim Filho et al., 2020;Horta et al., 2020). Results from Rio de Janeiro in April 2020 indicated that younger blood donors with lower education levels were more likely to test positive for SARS-CoV-2 antibodies (Amorim Filho et al., 2020). A nationwide study revealed that the poorest quintile was 2.16 times more likely to test positive with the lowest risks among white, educated, and wealthy individuals . Likewise, we found one of the highest prevalences in the poorest neighborhood, Terezópolis, that include the largest slum of the city where more than 23,000 people live.
Further modeling results showed higher infection rates among young adults, lower socioeconomic status, and people without healthcare access in the less developed North and Northeast areas until August 2020 (de Lima et al., 2021). Betim also presents most of its inhabitants with less than 59 years (90.7%), but no age effect was observed in the infectivity rates. Increased female infection odds were observed, although previous reports indicated a gender predisposition toward death in some Brazilian regions with higher male risk (Baqui et al., 2020). One possible explanation could be that 70% of the global health workforce are women (Lotta et al., 2021) and a gender bias of pandemic perception and attitude (Galasso et al., 2020).
COVID-19 diffusion presents strong socio-spatial determinants. Relocation diffusion from more-to less-developed regions and hierarchical diffusion from countries with higher population and density were relevant since early 2020 (Sigler et al., 2021). Data indicated a similar pattern in the São Paulo State with contiguous diffusion from the capital metropolitan area and hierarchical with long-distance spread through major highways that connects São Paulo city with cities of regional relevance (Fortaleza et al., 2021). Modeling results revealed that São Paulo city may have accounted for more than 85% of the initial case spread in the entire country (Nicolelis et al., 2021). Betim is directly connected to São Paulo city by a main national highway which may have contributed to COVID-19 diffusion.
Genomic surveillance is a powerful tool to elucidate viral dispersion patterns. The first sequencing work conducted in Brazil evaluated the first six positive individuals and reported the same predominant lineages in Italy . Later, a study with samples collected until late April 2020 from different country areas showed the dominance of clade B-derived lineages. At the national level, the respective frequency of these clades was seen in a 98.98%/1.02% ratio  extent of these lineages' dominance in the Brazilian scenario at the moment. Independent introductions also emphasize the importance of inter-state mobility barriers as a measure to control the epidemic. Our study presents some limitations. First, the household survey is less likely to sample severe cases, thus underestimating symptomatic COVID-19. Second, all clinical data were selfreported, which may lead to reporting bias (Baker et al., 2004). Third, we could not sequence all PCR positive samples due to the low viral load and sequencing technology employed, but we do not expect biased frequency estimation since no differences in mean viral load was reported across B.1.1.28 and B.1.1.33 lineages. A different scenario, later in 2020, was observed after variants of concern detection that led to higher mean viral loads (e.g., P.1 or gamma variant) (Faria et al., 2021). Fourth, our study provides a limited picture of the local epidemic because of the short period across surveys, although it was the moment when the city had less reliable data regarding pandemic progression. Nevertheless, our study shows the potential to integrate different epidemiological inquiries (prevalence, seroprevalence, and genomic surveillance) to describe pandemic dispersion adequately. Moreover, our findings present original and relevant evidence that has helped local government authorities to guide pandemic management.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Betim Ethics Research Committee. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.