Molecular epidemiology of continued Plasmodium falciparum disease transmission after an outbreak in Ecuador

To better understand the factors underlying the continued incidence of clinical episodes of falciparum malaria in E-2025 countries targeting elimination, we characterized the molecular epidemiology of Plasmodium falciparum disease transmission after a clonal outbreak in Ecuador. Here we study disease transmission by documenting the diversity and population structure of the major variant surface antigen of the blood stages of P. falciparum encoded by the var multigene family. We used a high-resolution genotyping method, “varcoding”, involving targeted amplicon sequencing to fingerprint the DBLα encoding region of var genes to describe both antigenic var diversity and var repertoire similarity or relatedness in parasite isolates from clinical cases. We identified nine genetic varcodes in 58 P. falciparum isolates causing clinical disease in 2013-2015. Network analyses revealed that four of the varcodes were highly related to the outbreak varcode, with identification of possible diversification of the outbreak parasites by recombination as seen in three of those varcodes. The majority of clinical cases in Ecuador were associated with parasites with highly related or recombinant varcodes to the outbreak clone and due to local transmission rather than recent importation of parasites from other endemic countries. Sharing of types in Ecuadorian varcodes to those sampled in South American varcodes reflects historical parasite importation of some varcodes, especially from Colombia and Peru. Our findings highlight the translational application of varcoding for outbreak surveillance in epidemic/unstable malaria transmission, such as in E-2025 countries, and point to the need for surveillance of local reservoirs of infection in Ecuador to achieve the malaria elimination goal by 2025.


Introduction
A recent global push for national malaria elimination in 21 countries by 2020 led to three being declared malaria-free (Algeria, El Salvador and Paraguay), but the target was not met for the remaining countries (1,2). These countries are now part of a renewed initiative launched by the WHO in 2021 supporting a total of 25 countries, known as E-2025 countries, with the shared goal to eliminate local transmission of malaria by 2025 by reducing the incidence of indigenous or locally transmitted cases to zero (2). These countries include Mexico, Panama, Ecuador, South Africa, Eswatini, Thailand, Malaysia, among others. Malaria transmission in many of these countries is epidemic/unstable with risks of outbreaks, parasite importation, and resurgent malaria (3)(4)(5).
Ecuador, one of the nine E-2025 countries in Latin America, did not meet the 2020 elimination target due to malaria resurgence since 2015 (2,6,7). Malaria elimination efforts in Ecuador have largely focused on tropical areas, specifically the northwest coast and the Amazon region (8)(9)(10)(11)(12). These areas border non E-2025 countries, Colombia and Peru, that still have endemic and moderate to low transmission (13)(14)(15)(16)(17)(18)(19)(20). Clinical cases caused by P. falciparum are mostly concentrated in the northwest coast and Ecuador has epidemic/unstable P. falciparum transmission, with P. vivax being the dominant species. Determining whether the continued incidence of P. falciparum clinical cases is due to imported or locally acquired parasites is of key public health interest to better understand disease transmission patterns and aid decision-making to allocate limited resources to achieve the E-2025 elimination goal.
Application of genotyping methods to epidemiological analyses can aid public health responses in such settings by informing on parasite diversity and relatedness to identify imported cases and characterize residual and/or resurgent disease transmission patterns with higher resolution than routine case incidence monitoring (21,22). Such epidemiological surveys can be impactful in delineating malaria transmission hotspots, if any, within an endemic area. Molecular surveillance to characterize malaria transmission patterns has relied on genotyping neutral molecular markers such as microsatellites or single-nucleotide polymorphisms (SNPs) as more cost-effective approaches than whole genome sequencing. Of relevance to low or epidemic transmission, neutral variation has been commonly used to inform about clonality and relatedness as well as track genomes spatially. Genes under selection can provide an alternative but complementary view of microevolution (23)(24)(25).
Molecular surveillance based on diversity of the genes encoding the major surface antigen of pathogens is a well-established microbiological paradigm typically used in virology (e.g. Influenza A, HIV-1, SARS-CoV-2) and bacteriology (e.g. Neisseria spp, Streptococcus pyogenes) because genotyping such antigens provides important information on recombination rates, the factors associated with these recombination events, and transmission dynamics. Employing this surveillance approach is however more complex in falciparum malaria where the major variant surface antigen of the blood stages known as PfEMP1 is encoded by the var multigene family with 40-60 var genes per genome (26) and extensive diversity of var genes exists in parasite populations (27). Indeed, var diversity has been shown to be correlated with transmission intensity with the highest diversity, in the order of tens of thousands of variants, seen in African populations and the lowest in the Americas, in the order of hundreds (27-31). Var genes and repertoires diversify by meiotic recombination during the obligatory sexual phase of the life cycle in the mosquito as well as by mitotic recombination. We define a fingerprint of the var repertoire of an isolate by genotyping the highly diverse DBLa tag of the P. falciparum var multigene family (29)(30)(31)(32)(33). This fingerprinting method we call "varcoding" requires a single PCR with degenerate primers followed by amplicon deep sequencing. We sample DBLa diversity of P. falciparum within and between human hosts, as well as measure similarity or relatedness between var repertoires we call varcodes. Bayesian statistics are used to account for variable sampling of all members of the multigene family (genomes contain 40-60 var genes) in field samples when amplifying low-quality parasite DNA with degenerate primers and to quantify the uncertainty around relatedness estimates ( Figure 1). This analytical approach was designed specifically for var genes under immune selection where alleles per se cannot be assigned since chromosomal positions are unknown, and commonly used approaches to infer relatedness such as identity-by-descent (IBD) are not straightforward to apply due to the complexities of this multi-copy var gene family.
Here, we apply varcoding for the first time in unstable, epidemic malaria to look at diversity and population structure of these immune evasion genes. Specifically, we characterize the transmission of clinical P. falciparum cases in Ecuador during and up to two years after an outbreak in 2012-2013, which was previously found to be clonal by microsatellite genotyping (i.e., caused by a single parasite lineage) (19). Analysis of varcode relatedness networks allowed us to define different genomic parasite lineages (or varcodes) circulating locally, as well as detect signatures of highly related parasites and possibly recently recombined genomes with respect to var repertoires. Parasites with the varcode of the outbreak clone or recombinant/highly related varcodes relative to the outbreak clonal lineage were predominantly associated with the continued incidence of clinical cases after the outbreak. Hot spots for varcode diversity were identified suggestive of the existence of local reservoirs of infection. Further comparative analyses to published data from historical South American isolates (30) elucidated possible origins of Ecuadorian parasites, demonstrating that the majority of clinical cases were due to local transmission and not recent importation.

Study design and sample collection
In this molecular epidemiological study, we examined P. falciparum isolates collected from 2013-2015 in Ecuador from individuals of all ages presenting with uncomplicated malaria cases confirmed by microscopy and/or PET-PCR (34). Details on the study population and data collection have been previously published (19,35,36). Briefly, these samples were collected during passive surveillance by the Ecuadorian Ministry of Health from consenting participants who lived in the areas where the samples were taken, were over 2 years old, agreed to participate and provided a blood sample (either venous blood or dried blood spot) and answered a basic demographic questionnaire (including places recently travelled and their address). Genomic DNA was extracted from venous blood or dried blood spots using a QIAamp DNA MINI KIT (QIAGEN, USA) as recommended by the manufacturer. The study was approved by the ethics committees at the Pontificia Universidad Catoĺica del Ecuador (Quito, Ecuador) and The University of Melbourne (Melbourne, Australia).

Var DBLa amplification and sequencing
Var genotyping PCR, sequencing details, and related data processing steps have been previously published (33, 37). Briefly, for each P. falciparum isolate, a single-step PCR to amplify the DBLa domains of var genes was performed using universal degenerate primers targeting homology blocks D (forward primer) and H (reverse primer) originally described in (38), with the addition of 10bp GS FLX Titanium multiplex identifier primer sequence for barcoding (39). The PCR reaction was prepared in a total volume of 40ml, containing 3mL of genomic DNA, MgCl 2 at a final concentration of 2mM, dNTPs at a final concentration of 0.07mM, each primer at a final concentration of 0.375mM and 3 units of Flexi DNA Taq polymerase (Promega). The cycling conditions were as follows: initial denaturation step of 2 min at 95°C was followed by 30 cycles of: annealing for 40 seconds at 95°C, extension for 90 seconds at 49°C, denaturation for 90 seconds at 65°C , and then a final extension step of 10 min at 65°C. The resulting individually barcoded DBLa amplicons (approximately 450-700bp in length) were pooled equimolarly and barcoded libraries were prepared using the KAPA Low-Throughput Library Preparation Kit (Kapa Biosystems). The libraries were sequenced on a MiSeq Illumina platform using the 2x300bp paired-end protocol and MiSeq Reagent kit v3 chemistry (Australian Genome Research Facility, Melbourne, Australia).
In addition, any P. falciparum isolate with low sequencing quality (< 10 DBLa types) was removed from the analysis. Therefore, from a total of 70 genotyped P. falciparum Ecuadorian Schematic diagram of the varcoding approach. For more details about each step, see Methods. The illumina sequencer stock image was created with BioRender.com.
isolates, 12 P. falciparum isolates with low DNA quality and/or poor sequencing quality were removed and we obtained var DBLa data for 58 isolates (82.9%). A total of 543 unique DBLa types identified in the 186 South American P. falciparum isolates [N = 58 Ecuadorian P. falciparum isolates from this study, N = 128 South American P. falciparum isolates previously published in (30)] were used for subsequent var analyses at the regional-level, and only the 195 unique DBLa types identified in Ecuador were used for Ecuador-specific analyses. To evaluate how well we sampled the true pool of var DBLa diversity (i.e., the true number of genetic variants circulating) in Ecuador and in South America, we used the R package vegan (41) to generate species accumulation curves where the number of unique DBLa types are plotted as a function of the number of sampled sequences. Plateauing of this curve indicates saturation and robust sampling of the diversity pool (i.e. sampling more sequences will not identify any new types).

Genetic relatedness analyses
Measuring pairwise type sharing To estimate varcode similarity or relatedness between all isolate pairs, we calculated the similarity index Pairwise Type Sharing (P TS ) (28), as adapted by He et al. (37) to account for differences in DBLa sampling across isolates (i.e, differences in isolate repertoire sizes), and unbiased Bayesian pairwise type sharing estimates (BP TS ) to further account for uncertainty in P TS estimates. P TS represents the proportion of shared DBLa types between an isolate pair. It is calculated directionally by dividing the number of shared DBLa types, N s , between two isolates a and b, by the total number of DBLa types in each isolate or repertoire size, N a or N b . Thus, for each pair of isolates we calculate P TS(a,b) = N s /N a and P TS(b,a) = N s /N b .
Varcode relatedness can also be explored more rigorously with unbiased Bayesian pairwise type sharing (BP TS ) (42,43). This approach uses Bayesian inference methods, which estimate repertoire overlap and uncertainty, and uses them in a subsequent P TS calculation, carrying that uncertainty forward. The prior distribution for repertoire size, used in inference, was informed by observations as follows. First, the median observed repertoire size in Ecuadorian isolates was 37 types, ranging from 11 to 43 (Supplementary Table S1). Second, the number of expected var genes with DBLa domains from whole genome sequencing data of the Honduran laboratory reference strain HB3 was 42 (26) and 50 var genes based on long-read PacBio sequencing of HB3 (44). And third, based on our sequencing data of 37 technical replicates of HB3, the median repertoire size or number of DBLa types per isolate was 39 (range 36-41 types), with 40 types consistently identified in the majority of replicates (range 21-37 replicates). We therefore used a uniform prior on repertoire sizes between 40 and 50 types, combined with the general Bayesian repertoire overlap framework (42) to produce unbiased estimates (posterior means). These were used to confirm our P TS estimates. As expected, the P TS and BP TS estimates were positively correlated (Pearson's correlation coefficient = 0.919, p < 0.001, Supplementary Figure S3). To measure uncertainty in central estimates, we computed a 95% highest density posterior interval (HDPI), a Bayesian version of confidence intervals, for each pairwise estimate. Like a frequentist confidence interval, the width of the HDPI provides a measure of uncertainty of each pairwise comparison. All posteriors were sampled using Markov chain Monte Carlo.

Interpretation of varcode relatedness measures
Every parasite isolate was compared to every other parasite isolate in the population to determine the proportion of shared DBLa types. Theoretically, pairwise comparisons resulting in P TS of 0 (0% shared types), 0.5 (50% shared types), and 1 (100% shared types) will reflect genetically distinct isolates with different var repertoires, isolates with recombinant or highly related var repertoires, and clones or genetically identical isolates with the same var repertoire, respectively. In practice, however, in lowtransmission areas often "fixed" relatedness thresholds may obscure true relatedness estimates because of inbreeding events and thus require confirmation by measuring the uncertainty around each estimate with methods like our Bayesian inference of P TS . We applied this approach to define "varcodes" as groups of isolates sharing ≥90% of their DBLa types (P TS ≥ 0.90), identifying putatively identical genomes within the margin of error of detection of 1-5 DBLa types in an isolate. We confirmed our interpretations of varcodes, recombinants/highly related, and genetically distinct isolates at the thresholds of 0, 0.5 and 0.90-1, respectively, by comparing to unbiased BP TS estimates (posterior means) and examining the HDPI.

Visualization of varcode relatedness networks
To visualize the varcode relatedness between isolates as determined by P TS or BP TS , we constructed networks using the R packages ggraph (45) and tidygraph (46) where isolates are depicted as nodes and edges as the P TS or BP TS values at a given threshold. The R package ggspatial (47) was used to plot spatial networks using latitude/longitude coordinates for sampling location. To visualize varcode relatedness of parasites over time we used the R package gganimate (48) to construct spatiotemporal relatedness networks. We generated a clustered heatmap based on the presence/absence matrix of DBLa types to visualize the genetic profiles of each isolate in Ecuador and each country in South America using the R package pheatmap (49) and the "complete" clustering method. Unrooted neighbor-joining phylogenetic trees based on pairwise genetic distance (calculated as 1-P TS ) were constructed using the R packages ape (50) and ggtree (51, 52).

Statistical analysis
We used R version 3.5.2 (53), base R, and the R packages dplyr (54), epiR (55) for all analyses. We used chi-squared tests for univariate analyses of categorical variables to compare proportions and for non-parametric tests to compare distributions of continuous variables between two groups (Mann-Whitney U test) or among k groups (Kruskal-Wallis test), with a Bonferroni correction for multiple comparisons.

Results
Limited var diversity in Ecuadorian P. falciparum populations

Defining nine distinct varcodes circulating in Ecuador and describing spatiotemporal disease transmission trends
To define distinct varcodes circulating locally in Ecuador, we estimated relatedness between two isolates' varcodes by calculating the similarity index, Pairwise Type Sharing (P TS ) and constructed relatedness networks to identify genomes with the same varcodes (P TS ≥0.90). We confirmed this by comparing varcodes derived from P TS to those derived from posterior mean BP TS estimates (Supplementary Figure S3), which include statistical uncertainty, and by examining the lower and upper bounds of the 95% highest density posterior intervals (HDPIs). This revealed nine genetically distinct varcodes in this study with 36 isolates having varcode1 (representing the clonal outbreak in Esmeraldas City, salmon pink varcode Figures 3 and Figure 4A) and at the other extreme varcode4, varcode5, varcode9 were seen only once ( Figure 4A, S4A). Our definition of varcodes proved to predict genomic lineages (identityby-descent ≥0.99) based on published whole genome sequence data in the case of 30 of the isolates (six varcodes) that were analyzed by both WGS (56) and varcoding. As expected, the outbreak varcode1 was clonal (indicated by HDPIs that included 1; Figure S5B), as previously demonstrated by microsatellite genotyping (19) and more recently by WGS (56). The outbreak varcode1 identified in Esmeraldas City was also identified in San Lorenzo, Esmeraldas (~150km from Esmeraldas City) in 2013, then Cascales, Sucumbios (>300km away) in 2014, and then in Tobar Donoso, Carchi (~150km away) in 2015 ( Figure 4B). Overall, persistent disease transmission of the same varcodes was observed both over time ( Figure 4B) and large distances ( Figures 3A-C). The median time between first and last identification of the same varcodes with any clinical case was 216 days or approximately 7 months (range = 190 -823 days) during the study period. This is in line with reports in Peru (57) and Colombia (18, 58) where clonal parasites were shown to be circulating up to five and eight years after first identification, respectively.
Varcode relatedness networks reveal signatures of highly related parasites and local disease transmission We next constructed spatiotemporal varcode relatedness networks to explore signatures of local disease transmission after the outbreak. Figure 4C shows that 15 isolates with four varcodes (varcode3, varcode4, varcode6, varcode7) clustered with the outbreak varcode1 at the threshold of P TS ≥0.50 ( Figure 4C). The remaining 7 isolates with different varcodes did not cluster in the network and were genetically distinct. These patterns were confirmed using BP TS estimates based on posterior means (Supplementary Figure S6A) and their corresponding 95% HDPIs (Supplementary Figures S5C-F, S6B, C).
To better understand the varcode relatedness signatures in the "outbreak cluster" we examined both P TS and BP TS estimates. In some instances, highly related varcodes shared~50% of the outbreak DBLa types, such as the outbreak varcode1 with Varcode relatedness networks in Ecuador (A) A network visualization of the varcode relatedness of P. falciparum isolates at the threshold of P TS ≥0.90 to define varcodes (see Methods). Every node represents a P. falciparum isolate and an edge represents the P TS value between two particular nodes/isolates. Isolates that cluster together (i.e., connected by edges) are considered to have the same varcode. Each color represents a different varcode. Two P. falciparum isolates belonging to outbreak varcode1 appear as outliers in the network due to undersampling of their DBLa types. varcodes3,4,6,7 (median P TS range=37-58%, median BP TS =45-70%, Figures S5C, D). This level of varcode relatedness would be consistent with the generation of a new varcode through outcrossing by conventional meiosis resulting in recombinant var repertoires of the outbreak clone. When examining the pairwise combinations of varcodes 3,4,6,7, sharing of types also indicates high relatedness (median P TS range=27-50%, median BP TS =32-64%, Figures S5C, D), pointing to the possibility of a prior cross followed by a backcross given the overall limited var diversity in the parasite population. However, the timing of such crosses is unknown. Interestingly, all the possibly recombinant and highly related varcodes (varcodes3,4,6,7) were identified in San Lorenzo with varcode7 being the main one, indicating this area is a transmission hot spot and a reservoir of parasites with diverse var repertoires ( Figure 3D). To further confirm the observed varcode relatedness signatures, we constructed a clustered heatmap to visualize the genetic profiles of each isolate based on the presence/absence of the 195 DBLa types, such that genetically similar isolates cluster together ( Figure 4D). This analysis confirmed that in the case of isolates identified as highly related to the outbreak varcode1, a proportion of outbreak DBLa types as well as different DBLa types were present. By contrast, the genetic profiles of isolates with varcodes 2, 5, 8, and 9 had more different DBLa types to all other varcodes, i.e., parasites with genetically distinct varcodes, with sharing of only <30% of types in most instances (median P TS range= 2-29% except for 2 pairwise comparisons 34-37%; median BP TS range=4-30% except for 5 pairwise comparisons 31-41%, Figures S5E, F).
Parasites with highly related/recombinant varcodes were most frequently associated with P. falciparum clinical episodes following the 2012-13 outbreak Prior to the outbreak there had been a steady decline in clinical cases; however, increased incidence of disease occurred after the outbreak. Therefore, we analyzed trends in the epidemiology of P. falciparum cases that occurred post-outbreak to understand if there were any key risk factors. Of the 25 individuals in our study that had a clinical P. falciparum episode post-outbreak, we had age data for 18 individuals (72%, Table 1). There was no significant difference (Kruskal-Wallis test, p = 0.65) in the median age of individuals experiencing clinical episodes caused by parasites with the outbreak varcode1 (19 years, range = 19 -57 years, N = 3 patients), a highlyrelated/recombinant of varcode1 (i.e., varcodes 3, 4, 6, or 7) (25 years, range = 17 -58 years, N = 13 patients), or by a different varcode (34.5 years, range = 32 -37 years, N = 2 patients). A greater diversity of varcodes (1-9 inclusive) was associated with incidence of clinical malaria post-outbreak than during the 2013 outbreak (varcodes 1,2,3). Indeed in 2014, 79% of the cases sampled were caused by either parasites with the outbreak varcode1 (21%) or with any of the four highly-related/recombinants of varcode1 (58%). The trend was similar in 2015 with 83% of the cases sampled caused by either parasites with the outbreak varcode1 (33%) or with any of the four highly-related/recombinants of varcode1 (50%), although our sampling of reported cases in 2015 after January was limited. Importantly, overall, we found that 80% of the cases sampled after the outbreak were caused by either parasites with the outbreak varcode1 (24%) or with any of the four highly-related/ recombinants of varcode1 (56%), especially with varcode7 (representing 57% of the cases caused by parasites with highlyrelated/recombinant varcodes). Thus, disease transmission after the outbreak in 2014 and into 2015 was predominantly associated with parasites with highly related/recombinant var repertoires of the outbreak clone, with most of these cases occurring in the San Lorenzo hotspot.  Table S1). Despite the difference in sampling years, the 543 unique DBLa types identified in these 186 isolates represent the diversity circulating in South American P. falciparum populations, as indicated by DBLa sampling accumulation curves approaching saturation (Supplementary Figure S7). An unrooted phylogenetic neighbor-joining tree showing pairwise genetic distance (1-P TS ) revealed distinct, country-specific clusters of isolates with related varcodes ( Figure 5B), consistent with previous analyses demonstrating geographic population structure (30). Thus, despite the relatively low var diversity, there is sufficient resolution to differentiate between the parasite populations from different South American countries and to explore the possible geographic origins at the level of country-specific DBLa types but also the unique combinations of these types as varcodes.

Comparative analyses to South
Varcoding resolved a total of 97 varcodes in South America (P TS ≥ 0.90), ranging from nine in Ecuador and Venezuela, to 56 varcodes in French Guiana and the varcodes were country-specific since none of the isolates from different countries clustered with isolates from another country at this threshold (Supplementary Figure S8). However, although identical varcodes were not identified in multiple countries, we were interested in examining varcode relatedness. To assess the overall sharing of types in the varcodes identified in Ecuador to those sampled in other South American parasite populations, we aggregated all possible DBLa types identified in the isolates comprising each Ecuadorian varcode (range: 34-47 types per varcode) and all DBLa types seen in any isolate from a given country (range: 112-249 types per country). The overall sharing of types was high, with the highest median sharing with Peru (53%, range: 26-72%) followed by Colombia (39%, range: 19-89%) and Venezuela (27%, range: 12-76%), and only 10% with French Guiana (range: 3-86%) except for varcode5 (86%) ( Figure 5D). Next we estimated P TS between all possible isolate pairs to examine whether any of the Ecuadorian parasites had varcodes that were genetically related to these historical South American parasites (P TS ≥ 0.50) and constructed regional varcode relatedness networks ( Figure 5C). We identified a genetically related Peruvian P. falciparum isolate that clustered with the "outbreak cluster", but especially with outbreak varcode1 (P TS = 0.66-0.75 with varcode1 isolates). This is consistent with previous analyses using microsatellites showing the outbreak source was possibly a residual parasite lineage circulating in Peru in 1999-2000 and in Ecuador in the early 1990s (13,19). More recent whole genome sequence data also points to the outbreak varcode1 circulating in Colombia in the early 2000s (56). No other South American isolate clustered with the "outbreak cluster" based on our network analysis, suggesting that there may be other locally circulating parasites in unidentified reservoirs in Ecuador, e.g. the parental varcodes of the possible recombinants. In the case of varcode3, the WGS study showed that isolates with varcode1 and varcode3 belong to distinct genomic lineages but are identical by descent across 80% of the genome thus belong to a "super-cluster" (56), which is consistent with our BP TS estimates of 70-84% type sharing ( Figure S5D). Additionally, the WGS study identified varcode6 and varcode7 in Colombian isolates from 2014-2016 and in 2016, respectively, but varcode4 was not identified in any of their Colombian isolates (56).
We were also interested to better understand the origins of those varcodes that did not cluster with the "outbreak cluster" since historically they may have been imported and may represent other locally circulating parasites that were not a direct result of onward transmission after the outbreak. We found evidence of two putative importations of parasites from neighboring countries, varcode9 from Colombia (Case EC53, Table 1), based on both genetic and epidemiologic data, and varcode5 (Case EC40, Table 1) the origin of which was less clear. The epidemiological data for the putative NA refers to no data collected for that particular epidemiological variable. *The self-reported infection location was recorded at the time of sample collection based on the answers provided by the study participants. **The putative origin as determined by varcoding.
infection location was recorded as Peru (Case EC40, Table 1), however based on our data this varcode is more related to historical P. falciparum populations from French Guiana and Venezuela than Peru. The remaining Ecuadorian isolates with varcodes 2 and 8 did not cluster with any other South American isolate. The epidemiological data for the putative infection location for varcode8 was recorded as Colombia (EC49 and EC50, Table 1), but these isolates did not cluster with Colombian isolates in our network analysis. Sharing of DBLa types in these varcodes with Colombian types was high (44% for varcode2 and 67% for varcode8, Figure 5D), providing evidence that parasites with varcode2 and varcode8 may represent residual parasites historically imported from Colombia. This is consistent with WGS data for isolates with varcode2, demonstrating they are identical by descent (IBD>0.99) with Colombian isolates from the early 2000s (56), suggesting past rather than recent importation. It is worth noting that varcode2 was identified in the hotspot of San Lorenzo, Esmeraldas confirming that this location may act as a reservoir of parasites with diverse var repertoires, as described above.

Discussion
Our investigation of the spatiotemporal incidence of clinical episodes of P. falciparum in Ecuador by varcoding supports the view that disease transmission after the 2012-2013 outbreak was sustained by parasites circulating in Ecuador, some of which may have had historical origins in neighboring countries. We observed persistence of the outbreak clonal lineage (identified with the same varcodes) and parasites with highly related varcodes predominantly associated with clinical disease after the outbreak. The observed varcode relatedness signatures may point to potential outcrossing of the outbreak lineage with other locally circulating parasites. Whether these recombinant varcodes resulted from sexual recombination events between parasites with outbreak varcode1 and genetically distinct parasites that were already circulating at low levels and/or in asymptomatic reservoirs in Ecuador [e.g (12)] or those that were previously imported, could not be ascertained from the current study population. Our results point to the need for resources to be focused locally in Ecuador to uncover the circulating reservoirs of infection. In this context, var genes will undoubtedly play a role in the persistence and virulence of these parasites. A role for human mobility must also be considered in the spread of P. falciparum in Ecuador, as parasites with the same varcodes were also observed across large distances (~150-300km) and two putative importations from Colombia and Peru were identified based on both epidemiological and varcoding data.
San Lorenzo was found to be a transmission hotspot likely due to mining activities and occupation-related travel in these areas, as well as its proximity to the Ecuador-Colombia border (4,35,(59)(60)(61). These findings point to San Lorenzo for both genetic surveillance and targeted interventions. Importantly, our results provide strong evidence for ongoing local transmission in Ecuador and provide the first baseline characterization of P. falciparum antigenic diversity and parasite varcodes circulating in the Ecuadorian northwest coast and Amazon regions. This forms a historical database that can be leveraged in future molecular epidemiology studies. It is worth noting that varcode5 identified in Orellana province, an area of the Amazon region that neighbors Peru, had a very different genetic profile to the other Ecuadorian varcodes (≥83% unique types). It is more genetically related to historical parasites from French Guiana and Venezuela pointing to highly diverse P. falciparum populations that are likely circulating in the Ecuadorian Amazon region and Ecuador-Peru border. Several recent outbreaks have occurred in the same border areas of Ecuador [in 2016 (9), 2018 (10), 2019 (11) and 2020 (62)]. Our findings of highly related and possible recombinant parasites of the outbreak clone causing clinical disease shortly after the outbreak may provide an explanation for how sustained epidemics of disease continue to occur locally. The importation of parasites combined with local transmission can clearly increase the pool of antigenic variants as well as overall genome diversity, although periodic outbreaks of local parasites may also provide sufficient conditions for this to occur locally in the absence of parasite importation.
We demonstrated that 58% of clinical cases sampled in 2014 immediately after the outbreak in Ecuador were caused by parasites with highly related or possibly recombinant varcodes. The same trend was observed in 2015, although our sampling was limited relative to the number of reported cases for the months sampled. Is this a chance finding, or a consequence of selection, i.e., drug or immune selection? Previously published drug-resistance marker genotyping data reported that all P. falciparum isolates in our study had genotypes associated with chloroquine-resistance (i.e., pfcrt 76T) and the majority had genotypes associated with sensitivity to sulfadoxine-pyrimethamine (i.e., pfdhps and pfdhfr) (36). Mutations in pfdhfr were only present in parasites with varcode2 and one of the highly related varcodes (varcode6) but not the other varcodes. Past antimalarial treatment for P. falciparum malaria in Ecuador included artesunate and sulfadoxine-pyrimethamine alone or in combination with primaquine until 2012 when a switch to artemether-lumefantrine +primaquine was introduced as the first-line treatment (63). Thus, it does not appear from our findings that drug selection is associated with the var population structure we observe in clinical cases after the outbreak. Therefore, we hypothesize that parasites with these highly related varcodes have new or antigenically novel combinations of var DBLa types and possibly alleles of single copy diverse antigen-encoding genes. This may provide an advantage in a population with variable levels of immune memory, as would be expected for these communities in Ecuador where seroprevalence of antibodies against P. falciparum has been reported to be as high as 22% (12). Supporting this view, prior network analyses and computational models combining evolution and epidemiology point to variant-specific immune selection defining var population structure even in low transmission, albeit to a much lesser extent than expected in high-transmission settings where selection acts strongly against recombinant or highly related repertoires due to cross-immunity (37, 64). The availability of var DBLa sequences provides the potential to test this hypothesis by serological methods that measure variant-specific immunity, as shown in serological studies in Papua New Guinea (65) and the Brazilian Amazon (66). In addition, characterization of antigenic variant var DBLa sequences could be leveraged in future gene expression studies to understand parasite virulence. Indeed, future investigations into a potential fitness advantage of varcode1 and how it may enhance malaria transmission are warranted, in addition to exploring the origins of more recent outbreaks and whether they are the same or new varcodes.
A previous study that included some of the same P. falciparum isolates from Esmeraldas City and San Lorenzo in the northwest coast described three main genetic clusters based on microsatellite genotyping (35). In contrast, when considering the same isolates, varcoding resolved six varcodes circulating. Similar observations have been described in an earlier study in Venezuela where sympatric parasites identical at neutral microsatellite loci were shown to have very different var repertoires (67). This is not surprising as varcoding is expected to provide higher resolution than microsatellite genotyping as we are looking at more loci, which are under selection. Importantly, our varcode membership designations are consistent with genomic lineages as defined by identity-by-descent (IBD) analyses (IBD ≥0.99) and "super-clusters" (IBD ≥0.80) from a recent whole genome sequencing (WGS) study of P. falciparum isolates from the Pacific Coast of Ecuador and Colombia collected in the early 2000s and in 2013-2017 (including 30 of the same isolates in this study) (56). Although P. falciparum WGS data exclude var data due to the difficulty of assembling highly diverse variant antigen genes, the WGS results are consistent with the population structure we infer from varcoding. To our knowledge this is the first direct comparison of parasite relatedness inference as determined by WGS IBD and varcoding P TS (and BP TS ) where, given the low transmission in this setting, defining population structure by neutral markers and varcoding appears to converge. This provides compelling evidence of the utility of varcoding and downstream P TS and BP TS analytical approaches in such settings. Indeed, it is noteworthy that varcoding predicted the same genomic lineages and resolved parasite relatedness in line with WGS analyses, with the added advantage of simultaneously providing surveillance information about the major variant surface antigen-encoding genes. An interesting line of inquiry beyond the scope of this study will be to explore whether IBD-based approaches can be applied to varcoding data, particularly in elimination settings where the population may be highly inbred and most infections are monoclonal.
Amplicon sequencing is being promoted as a potential costeffective approach for molecular surveillance to inform on transmission patterns, potential parasite importation and parasite relatedness for malaria elimination efforts. Varcoding is a targeted amplicon sequencing approach requiring a single PCR with degenerate primers to amplify 40-60 genes from the var  50. Every node represents a P. falciparum isolate and an edge represents the P TS value between two particular nodes/isolates. Isolates/varcodes that cluster together (i.e., connected by edges) are genetically related. (D) Relatedness of Ecuadorian varcodes to P. falciparum populations from South America. The relatedness of each varcode was measured by first concatenating all possible var DBLa types that were identified in the P. falciparum isolates comprising each varcode as well as concatenating all var DBLa types that were identified in each country. Then P TS was calculated between each varcode and each country. The color gradient denotes the P TS value for a particular comparison, with darker shades showing higher relatedness and the different colors corresponding to the different country comparisons. N refers to the number of var DBLa types.
multigene family encoding the most immunogenic protein family of P. falciparum. Here we demonstrate its potential as a cost effective and high-resolution method to examine P. falciparum antigenic diversity, parasite relatedness and genome-wide diversity patterns for molecular surveillance in low-transmission settings where highly related parasites are circulating. This will prove particularly useful in the context of changing patterns of human mobility and gene flow in the Americas where there is high demand for such a molecular surveillance method. Given that malaria elimination is achieved locally, varcoding may be useful in areas where WGS may not be conducted routinely, or where infrastructure and local capacity may not exist. Even in areas where WGS is routinely conducted, varcoding has the potential for screening samples and identifying the need for downstream WGS to save on resources. The tool will also be useful in other epidemic or low-transmission settings targeting elimination across the globe. Undoubtedly, going the distance to elimination by 2025 must be supported by appropriate molecular surveillance to better understand and track disease transmission as well as uncover the existence of local reservoirs of antigenically diverse parasites.

Ethics statement
The studies involving human participants were reviewed and approved by Pontificia Universidad Catoĺica del Ecuador and University of Melbourne. The patients/participants provided their written informed consent to participate in this study.

Author contributions
SR-P and KPD conceived and designed the study. FES and CAV-A carried out field work to obtain P. falciparum isolates and epidemiological metadata. SR-P and KET developed the varcoding methodology. SR-P and SLD performed the molecular experiments and sequencing; SR-P, FES and CAV-A curated the clinical and epidemiological metadata. SR-P performed the formal data analysis and visualization. EKJ and DBL designed and conducted the Bayesian statistical analysis. SR-P, FES, KET and KPD contributed to the interpretation of the data. KPD supervised the work. SR-P, FES. and KPD acquired funding. SR-P and KPD wrote the paper with contributions from all authors. All authors contributed to the article and approved the submitted version.